Learning Robust Reward Machines from Noisy Labels
Roko Parac, Lorenzo Nodari, Leo Ardon, Daniel Furelos-Blanco, Federico, Cerutti, Alessandra Russo

TL;DR
This paper introduces PROB-IRM, a method for learning robust reward machines from noisy data to improve reinforcement learning, demonstrating effectiveness comparable to handcrafted models despite noise challenges.
Contribution
PROB-IRM is a novel approach that combines inductive logic programming and Bayesian methods to learn reward machines robustly from noisy traces in reinforcement learning.
Findings
PROB-IRM successfully learns reward machines from noisy traces.
Agents trained with PROB-IRM perform comparably to those with handcrafted RMs.
The approach enhances robustness and efficiency in RL tasks with noisy data.
Abstract
This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent's task into different subtasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
