Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Huajie Tan; Sixiang Chen; Yijie Xu; Zixiao Wang; Yuheng Ji; Cheng Chi; Yaoxu Lyu; Zhongxia Zhao; Xiansheng Chen; Peterson Co; Shaoxuan Xie; Guocai Yao; Pengwei Wang; Zhongyuan Wang; Shanghang Zhang

arXiv:2512.23703·cs.RO·December 30, 2025

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Huajie Tan, Sixiang Chen, Yijie Xu, Zixiao Wang, Yuheng Ji, Cheng Chi, Yaoxu Lyu, Zhongxia Zhao, Xiansheng Chen, Peterson Co, Shaoxuan Xie, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

PDF

Open Access 3 Models 1 Datasets

TL;DR

This paper introduces Dopamine-Reward, a novel reward modeling approach for robotic manipulation that improves reward accuracy and policy learning efficiency by leveraging multi-view inputs and a theoretically sound reward shaping method.

Contribution

It presents a general-purpose, step-aware process reward model trained on extensive data, and a policy learning framework that avoids semantic traps through sound reward shaping.

Findings

01

GRM achieves state-of-the-art reward assessment accuracy.

02

Dopamine-RL significantly enhances policy learning efficiency.

03

One-shot adaptation enables high success rates with minimal interactions.

Abstract

The primary obstacle for applying reinforcement learning (RL) to real-world robotics is the design of effective reward functions. While recently learning-based Process Reward Models (PRMs) are a promising direction, they are often hindered by two fundamental limitations: their reward models lack step-aware understanding and rely on single-view perception, leading to unreliable assessments of fine-grained manipulation progress; and their reward shaping procedures are theoretically unsound, often inducing a semantic trap that misguides policy optimization. To address these, we introduce Dopamine-Reward, a novel reward modeling method for learning a general-purpose, step-aware process reward model from multi-view inputs. At its core is our General Reward Model (GRM), trained on a vast 3,400+ hour dataset, which leverages Step-wise Reward Discretization for structural understanding and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

tanhuajie2001/Robo-Dopamine-Bench
dataset· 54 dl
54 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI