Reinforcement Learning with LTL and $\omega$-Regular Objectives via   Optimality-Preserving Translation to Average Rewards

Xuan-Bach Le; Dominik Wagner; Leon Witzman; Alexander Rabinovich; Luke; Ong

arXiv:2410.12175·cs.LG·October 17, 2024

Reinforcement Learning with LTL and $\omega$-Regular Objectives via Optimality-Preserving Translation to Average Rewards

Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, Luke, Ong

PDF

Open Access 1 Video

TL;DR

This paper presents a method to translate $ ext{LTL}$ and $ ext{$oldsymbol{ extomega}$-regular}$ objectives in reinforcement learning into average reward problems using reward machines, enabling asymptotic learning of optimal policies.

Contribution

It introduces an optimality-preserving reduction from $ ext{$oldsymbol{ extomega}$-regular}$ objectives to limit-average reward problems via reward machines, solving an open problem in the field.

Findings

01

Optimal policies for $ ext{LTL}$ and $ ext{$oldsymbol{ extomega}$-regular}$ objectives can be learned asymptotically.

02

Reduction via reward machines preserves optimality.

03

Sequence of discount-sum problems approximates limit-average solutions.

Abstract

Linear temporal logic (LTL) and, more generally, $ω$ -regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for $ω$ -regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and $ω$ -regular objectives can be learned asymptotically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning with LTL and $\omega$-Regular Objectives via Optimality-Preserving Translation to Average Rewards· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics