MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Yuxin Chen; Chen Tang; Jianglan Wei; Chenran Li; Ran Tian; Xiang Zhang; Wei Zhan; Peter Stone; Masayoshi Tomizuka

arXiv:2406.16258·cs.RO·October 27, 2025

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Yuxin Chen, Chen Tang, Jianglan Wei, Chenran Li, Ran Tian, Xiang Zhang, Wei Zhan, Peter Stone, Masayoshi Tomizuka

PDF

Open Access

TL;DR

MEReQ introduces a novel inverse reinforcement learning method that efficiently aligns robot policies with human preferences by inferring residual rewards, significantly improving sample efficiency in interactive imitation learning.

Contribution

It proposes MEReQ, a new residual reward-based IRL approach that enhances sample efficiency for human-in-the-loop policy alignment in embodied AI.

Findings

01

Achieves high sample efficiency in simulated tasks.

02

Effective in real-world human-robot interaction scenarios.

03

Outperforms existing methods in alignment accuracy.

Abstract

Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. Instead of inferring the complete human behavior characteristics, MEReQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Handwritten Text Recognition Techniques · Face and Expression Recognition

MethodsALIGN · Q-Learning