Provably Efficient Exploration in Reward Machines with Low Regret
Hippolyte Bourel, Anders Jonsson, Odalric-Ambrym Maillard, Chenxiao, Ma, Mohammad Sadegh Talebi

TL;DR
This paper introduces a model-based reinforcement learning algorithm that efficiently exploits reward machine structures with probabilistic dynamics, providing regret bounds and demonstrating improved performance over unstructured methods.
Contribution
It presents the first tailored RL algorithm with regret analysis for probabilistic reward machines, exploiting their structure for improved learning efficiency.
Findings
The algorithm achieves lower regret compared to unstructured RL algorithms.
High-probability, non-asymptotic regret bounds are derived.
A regret lower bound for the setting is established.
Abstract
We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with initially unknown dynamics, and investigate RL under the average-reward criterion, where the learning performance is assessed through the notion of regret. Our main algorithmic contribution is a model-based RL algorithm for decision processes involving probabilistic reward machines that is capable of exploiting the structure induced by such machines. We further derive high-probability and non-asymptotic bounds on its regret and demonstrate the gain in terms of regret over existing algorithms that could be applied, but obliviously to the structure. We also present a regret lower bound for the studied setting. To the best of our knowledge, the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Fault Detection and Control Systems · Fuzzy Logic and Control Systems
