RoboReward: General-Purpose Vision-Language Reward Models for Robotics
Tony Lee, Andrew Wagenmaker, Karl Pertsch, Percy Liang, Sergey Levine, Chelsea Finn

TL;DR
This paper introduces RoboReward, a large-scale vision-language reward dataset and models trained on it, to improve reinforcement learning in robotics by providing automatic, general-purpose rewards that outperform existing models and aid real-world robot policy learning.
Contribution
The work presents RoboReward, a new dataset and benchmark for vision-language rewards in robotics, along with trained models that outperform larger models and enhance real-robot reinforcement learning.
Findings
No existing VLMs excel across all tasks.
Trained models outperform larger VLMs in short-horizon tasks.
8B model improves policy learning in real robots.
Abstract
A well-designed reward is critical for effective reinforcement learning-based policy improvement. In real-world robotics, obtaining such rewards typically requires either labor-intensive human labeling or brittle, handcrafted objectives. Vision-language models (VLMs) have shown promise as automatic reward models, yet their effectiveness on real robot tasks is poorly understood. In this work, we aim to close this gap by introducing (1) RoboReward, a robotics reward dataset and benchmark built on large-scale real-robot corpora from Open X-Embodiment (OXE) and RoboArena, and (2) vision-language reward models trained on this dataset (RoboReward 4B/8B). Because OXE is success-heavy and lacks failure examples, we propose a negative examples data augmentation pipeline that generates calibrated negative and near-misses via counterfactual relabeling of successful episodes and temporal clipping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI
