Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Anthony Liang; Yigit Korkmaz; Jiahui Zhang; Minyoung Hwang; Abrar Anwar; Sidhant Kaushik; Aditya Shah; Alex S. Huang; Luke Zettlemoyer; Dieter Fox; Yu Xiang; Anqi Li; Andreea Bobu; Abhishek Gupta; Stephen Tu; Erdem Biyik; Jesse Zhang

arXiv:2603.02115·cs.RO·May 15, 2026

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Anthony Liang, Yigit Korkmaz, Jiahui Zhang, Minyoung Hwang, Abrar Anwar, Sidhant Kaushik, Aditya Shah, Alex S. Huang, Luke Zettlemoyer, Dieter Fox, Yu Xiang, Anqi Li, Andreea Bobu, Abhishek Gupta, Stephen Tu, Erdem Biyik, Jesse Zhang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

Robometer is a scalable framework for robotic reward modeling that combines local progress supervision with global trajectory preferences, enabling better learning from diverse and suboptimal data.

Contribution

It introduces Robometer, a novel reward modeling approach that effectively leverages large-scale, diverse trajectory data including failures, and curates the RBM-1M dataset for training.

Findings

01

Robometer outperforms prior methods in generalization across benchmarks.

02

It improves robot learning performance on various downstream tasks.

03

The RBM-1M dataset contains over one million diverse trajectories, including failures.

Abstract

General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints across trajectories of the same task, enabling effective learning from both real and augmented failed trajectories. To support this formulation at scale, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://robometer.github.io
github

Models

🤗
robometer/Robometer-4B
model· 3.2k dl· ♡ 4
3.2k dl♡ 4

Datasets

aliangdw/rbm-1m-ood-full
dataset· 291 dl
291 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.