Efficient Reinforcement Learning in Probabilistic Reward Machines

Xiaofeng Lin; Xuezhou Zhang

arXiv:2408.10381·stat.ML·August 21, 2024

Efficient Reinforcement Learning in Probabilistic Reward Machines

Xiaofeng Lin, Xuezhou Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces the first efficient reinforcement learning algorithm for Probabilistic Reward Machines, achieving near-optimal regret bounds and enabling reward-free exploration in non-Markovian reward settings, with strong empirical performance.

Contribution

The paper presents a novel RL algorithm for PRMs with improved regret bounds and a new simulation lemma for non-Markovian rewards, advancing the state of the art.

Findings

01

Achieves regret bound of ( ext{sqrt}{HOAT} + H^2O^2A^{3/2} + H ext{sqrt}{T})

02

Matches lower bound ( ext{sqrt}{HOAT}) under certain conditions

03

Demonstrates superior empirical performance over prior methods

Abstract

In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of $O (H O A T + H^{2} O^{2} A^{3/2} + H T)$ , where $H$ is the time horizon, $O$ is the number of observations, $A$ is the number of actions, and $T$ is the number of time-steps. This result improves over the best-known bound, $O (H O A T)$ of \citet{pmlr-v206-bourel23a} for MDPs with Deterministic Reward Machines (DRMs), a special case of PRMs. When $T \geq H^{3} O^{3} A^{2}$ and $O A \geq H$ , our regret bound leads to a regret of $O (H O A T)$ , which matches the established lower bound of $Ω (H O A T)$ for MDPs with DRMs up to a logarithmic factor. To the best of our knowledge, this is the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Reinforcement Learning in Probabilistic Reward Machines· underline

Taxonomy

TopicsStatistical and Computational Modeling