Learning Robust Reward Machines from Noisy Labels

Roko Parac; Lorenzo Nodari; Leo Ardon; Daniel Furelos-Blanco; Federico; Cerutti; Alessandra Russo

arXiv:2408.14871·cs.AI·March 24, 2025

Learning Robust Reward Machines from Noisy Labels

Roko Parac, Lorenzo Nodari, Leo Ardon, Daniel Furelos-Blanco, Federico, Cerutti, Alessandra Russo

PDF

Open Access 1 Repo

TL;DR

This paper introduces PROB-IRM, a method for learning robust reward machines from noisy data to improve reinforcement learning, demonstrating effectiveness comparable to handcrafted models despite noise challenges.

Contribution

PROB-IRM is a novel approach that combines inductive logic programming and Bayesian methods to learn reward machines robustly from noisy traces in reinforcement learning.

Findings

01

PROB-IRM successfully learns reward machines from noisy traces.

02

Agents trained with PROB-IRM perform comparably to those with handcrafted RMs.

03

The approach enhances robustness and efficiency in RL tasks with noisy data.

Abstract

This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent's task into different subtasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rparac/prob-irm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings