Eventual Discounting Temporal Logic Counterfactual Experience Replay

Cameron Voloshin; Abhinav Verma; Yisong Yue

arXiv:2303.02135·cs.LG·March 6, 2023·1 cites

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Cameron Voloshin, Abhinav Verma, Yisong Yue

PDF

Open Access

TL;DR

This paper introduces a novel approach combining eventual discounting and counterfactual experience replay to improve policy satisfaction of linear temporal logic specifications in reinforcement learning, addressing myopic limitations.

Contribution

It proposes a new value-function proxy using eventual discounting and a counterfactual experience replay method for better off-policy data generation in LTL-guided RL.

Findings

01

Effective in discrete and continuous spaces

02

Improves LTL satisfaction probability

03

Outperforms standard methods

Abstract

Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)

MethodsExperience Replay