On the (In)Tractability of Reinforcement Learning for LTL Objectives
Cambridge Yang, Michael Littman, Michael Carbin

TL;DR
This paper investigates the fundamental limitations of reinforcement learning for linear temporal logic (LTL) objectives, showing that only a limited class of LTL formulas are PAC-MDP-learnable, highlighting inherent intractability issues.
Contribution
It formalizes the problem within the PAC-MDP framework and proves that only finite-horizon decidable LTL formulas are PAC-MDP-learnable, revealing key intractability constraints.
Findings
PAC-MDP-learnability is limited to finite-horizon LTL formulas.
Most LTL objectives are not PAC-MDP-learnable in finite interactions.
Intractability results highlight fundamental barriers in RL for general LTL objectives.
Abstract
In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examined it in depth. In this paper, we address the tractability of reinforcement learning for general LTL objectives from a theoretical perspective. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable if and only if the formula is in the most limited class in the LTL hierarchy, consisting of formulas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
