Eventual Discounting Temporal Logic Counterfactual Experience Replay
Cameron Voloshin, Abhinav Verma, Yisong Yue

TL;DR
This paper introduces a novel approach combining eventual discounting and counterfactual experience replay to improve policy satisfaction of linear temporal logic specifications in reinforcement learning, addressing myopic limitations.
Contribution
It proposes a new value-function proxy using eventual discounting and a counterfactual experience replay method for better off-policy data generation in LTL-guided RL.
Findings
Effective in discrete and continuous spaces
Improves LTL satisfaction probability
Outperforms standard methods
Abstract
Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
MethodsExperience Replay
