Representation Alignment from Human Feedback for Cross-Embodiment Reward   Learning from Mixed-Quality Demonstrations

Connor Mattson; Anurag Aribandi; Daniel S. Brown

arXiv:2408.05610·cs.RO·August 13, 2024

Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations

Connor Mattson, Anurag Aribandi, Daniel S. Brown

PDF

Open Access

TL;DR

This paper explores how to learn transferable reward functions from mixed-quality video demonstrations across different embodiments, emphasizing the role of human feedback in aligning representations for effective cross-embodiment reinforcement learning.

Contribution

It introduces techniques leveraging human feedback to improve representation alignment and reward transfer in cross-embodiment inverse reinforcement learning from mixed-quality demonstrations.

Findings

01

Prior methods struggle with mixed-quality data.

02

Human feedback improves reward representation alignment.

03

Different techniques lead to qualitatively different reward behaviors.

Abstract

We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot a policy via human video demonstrations or teaching a robot to imitate a policy from another robot with a different embodiment. However, prior work has only focused on cases where near-optimal demonstrations are available, which is often difficult to ensure. By contrast, we study the setting of cross-embodiment reward learning from mixed-quality demonstrations. We demonstrate that prior work struggles to learn generalizable reward representations when learning from mixed-quality data. We then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Reinforcement Learning in Robotics · Human Pose and Action Recognition