Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Yanru Wu; Weiduo Yuan; Ang Qi; Vitor Guizilini; Jiageng Mao; Yue Wang

arXiv:2603.16065·cs.RO·March 24, 2026

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao, Yue Wang

PDF

Open Access 1 Models

TL;DR

This paper introduces a novel online reward generation framework using vision-language models to improve robotic manipulation policies efficiently, eliminating manual reward engineering and enabling rapid policy refinement.

Contribution

It presents a scalable, zero-shot reward model based on foundation VLMs that guides online policy refinement in robotic manipulation tasks.

Findings

01

Significant success in improving success rates within 30 RL iterations

02

Reward model operates effectively in zero-shot test environments

03

Enhances sample efficiency and reduces manual reward design effort

Abstract

Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward functions. In this paper, we propose a framework for online policy refinement by adapting foundation VLMs into online reward generators. We develop a robust, scalable reward model based on a state-of-the-art VLM, trained on a large-scale, multi-source dataset encompassing real-world robot trajectories, human-object interactions, and diverse simulated environments. Unlike prior approaches that evaluate entire trajectories post-hoc, our method leverages the VLM to formulate a multifaceted reward signal comprising process, completion, and temporal contrastive rewards based on current visual observations. Initializing with a base policy trained via Imitation Learning (IL), we employ these VLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
USC-PSI-Lab/LRM-models
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI