Reward Machine Inference for Robotic Manipulation

Mattijs Baert; Sam Leroux; Pieter Simoens

arXiv:2412.10096·cs.RO·December 16, 2024

Reward Machine Inference for Robotic Manipulation

Mattijs Baert, Sam Leroux, Pieter Simoens

PDF

TL;DR

This paper presents a new method for robots to learn reward structures directly from visual demonstrations, improving task understanding and policy learning without prior knowledge of rewards.

Contribution

It introduces a novel learning-from-demonstrations approach that infers reward machines from visual data without predefined propositions or reward signals.

Findings

01

Accurately infers reward structures from visual demonstrations.

02

Enables RL agents to learn effective policies for manipulation tasks.

03

Works without prior knowledge of reward signals.

Abstract

Learning from Demonstrations (LfD) and Reinforcement Learning (RL) have enabled robot agents to accomplish complex tasks. Reward Machines (RMs) enhance RL's capability to train policies over extended time horizons by structuring high-level task information. In this work, we introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks. Unlike previous methods, our approach requires no predefined propositions or prior knowledge of the underlying sparse reward signals. Instead, it jointly learns the RM structure and identifies key high-level events that drive transitions between RM states. We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.