Reward Machine Inference for Robotic Manipulation
Mattijs Baert, Sam Leroux, Pieter Simoens

TL;DR
This paper presents a new method for robots to learn reward structures directly from visual demonstrations, improving task understanding and policy learning without prior knowledge of rewards.
Contribution
It introduces a novel learning-from-demonstrations approach that infers reward machines from visual data without predefined propositions or reward signals.
Findings
Accurately infers reward structures from visual demonstrations.
Enables RL agents to learn effective policies for manipulation tasks.
Works without prior knowledge of reward signals.
Abstract
Learning from Demonstrations (LfD) and Reinforcement Learning (RL) have enabled robot agents to accomplish complex tasks. Reward Machines (RMs) enhance RL's capability to train policies over extended time horizons by structuring high-level task information. In this work, we introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks. Unlike previous methods, our approach requires no predefined propositions or prior knowledge of the underlying sparse reward signals. Instead, it jointly learns the RM structure and identifies key high-level events that drive transitions between RM states. We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
