ARM: Advantage Reward Modeling for Long-Horizon Manipulation

Yiming Mao; Zixi Yu; Weixin Mao; Yinhao Li; Qirui Hu; Zihan Lan; Minzhao Zhu; Hua Chen

arXiv:2604.03037·cs.RO·April 22, 2026

ARM: Advantage Reward Modeling for Long-Horizon Manipulation

Yiming Mao, Zixi Yu, Weixin Mao, Yinhao Li, Qirui Hu, Zihan Lan, Minzhao Zhu, Hua Chen

PDF

TL;DR

ARM introduces a reward modeling framework that estimates relative advantage using a tri-state labeling strategy, improving long-horizon manipulation with minimal human effort and enhanced data efficiency.

Contribution

The paper presents Advantage Reward Modeling (ARM), a novel approach that replaces absolute progress with advantage estimation and a tri-state labeling strategy for better RL in complex tasks.

Findings

01

Achieved 99.4% success rate on towel-folding task.

02

Enabled stable and data-efficient policy training with minimal human intervention.

03

Improved over current VLA baselines in long-horizon manipulation.

Abstract

Long-horizon robotic manipulation remains challenging for reinforcement learning (RL) because sparse rewards provide limited guidance for credit assignment. Practical policy improvement thus relies on richer intermediate supervision, such as dense progress rewards, which are costly to obtain and ill-suited to non-monotonic behaviors such as backtracking and recovery. To address this, we propose Advantage Reward Modeling (ARM), a framework that shifts from hard-to-quantify absolute progress to estimating relative advantage. We introduce a cost-effective tri-state labeling strategy -- Progressive, Regressive, and Stagnant -- that reduces human cognitive overhead while ensuring high cross-annotator consistency. By training on these intuitive signals, ARM enables automated progress annotation for both complete demonstrations and fragmented DAgger-style data. Integrating ARM into an offline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.