Policy Synthesis and Reinforcement Learning for Discounted LTL
Rajeev Alur, Osbert Bastani, Kishor Jothimurugan, Mateo Perez, Fabio, Somenzi, Ashutosh Trivedi

TL;DR
This paper explores using discounted linear temporal logic (LTL) to improve reinforcement learning policy synthesis in Markov decision processes, addressing sensitivity issues and enabling reduction to discounted-sum rewards.
Contribution
It introduces a method to utilize discounted LTL for policy synthesis and demonstrates reduction to discounted-sum reward with reward machines when discount factors are uniform.
Findings
Addresses LTL sensitivity in RL with discounting.
Provides reduction of discounted LTL to discounted-sum reward.
Applicable to Markov decision processes with unknown transitions.
Abstract
The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReceptor Mechanisms and Signaling · Reinforcement Learning in Robotics · Formal Methods in Verification
