Provably Efficient Exploration in Reward Machines with Low Regret

Hippolyte Bourel; Anders Jonsson; Odalric-Ambrym Maillard; Chenxiao; Ma; Mohammad Sadegh Talebi

arXiv:2412.19194·cs.LG·December 30, 2024

Provably Efficient Exploration in Reward Machines with Low Regret

Hippolyte Bourel, Anders Jonsson, Odalric-Ambrym Maillard, Chenxiao, Ma, Mohammad Sadegh Talebi

PDF

Open Access

TL;DR

This paper introduces a model-based reinforcement learning algorithm that efficiently exploits reward machine structures with probabilistic dynamics, providing regret bounds and demonstrating improved performance over unstructured methods.

Contribution

It presents the first tailored RL algorithm with regret analysis for probabilistic reward machines, exploiting their structure for improved learning efficiency.

Findings

01

The algorithm achieves lower regret compared to unstructured RL algorithms.

02

High-probability, non-asymptotic regret bounds are derived.

03

A regret lower bound for the setting is established.

Abstract

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with initially unknown dynamics, and investigate RL under the average-reward criterion, where the learning performance is assessed through the notion of regret. Our main algorithmic contribution is a model-based RL algorithm for decision processes involving probabilistic reward machines that is capable of exploiting the structure induced by such machines. We further derive high-probability and non-asymptotic bounds on its regret and demonstrate the gain in terms of regret over existing algorithms that could be applied, but obliviously to the structure. We also present a regret lower bound for the studied setting. To the best of our knowledge, the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Fault Detection and Control Systems · Fuzzy Logic and Control Systems