Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives
Milad Kazemi, Mateo Perez, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Alvaro Velasquez

TL;DR
This paper introduces a novel model-free reinforcement learning framework that optimizes absolute liveness specifications expressed as omega-regular languages using average-reward objectives, suitable for ongoing tasks without resets.
Contribution
It is the first to translate absolute liveness omega-regular specifications into average-reward objectives for model-free RL in unknown MDPs, supporting continuous interaction.
Findings
Outperforms discount-based methods in benchmarks
Guarantees convergence in unknown communicating MDPs
Supports on-the-fly environment reductions
Abstract
Recent advances in reinforcement learning (RL) have renewed interest in reward design for shaping agent behavior, but manually crafting reward functions is tedious and error-prone. A principled alternative is to specify behavioral requirements in a formal, unambiguous language and automatically compile them into learning objectives. -regular languages are a natural fit, given their role in formal verification and synthesis. However, most existing -regular RL approaches operate in an episodic, discounted setting with periodic resets, which is misaligned with -regular semantics over infinite traces. For continuing tasks, where the agent interacts with the environment over a single uninterrupted lifetime, the average-reward criterion is more appropriate. We focus on absolute liveness specifications, a subclass of -regular languages that cannot be violated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
