GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

Yunfei Li; Xiao Ma; Jiafeng Xu; Yu Cui; Zhongren Cui; Zhigang Han; Liqun Huang; Tao Kong; Yuxiao Liu; Hao Niu; Wanli Peng; Jingchao Qiao; Zeyu Ren; Haixin Shi; Zhi Su; Jiawen Tian; Yuyang Xiao; Shenyu Zhang; Liwei Zheng; Hang Li; Yonghui Wu

arXiv:2512.01801·cs.RO·December 24, 2025

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, Wanli Peng, Jingchao Qiao, Zeyu Ren, Haixin Shi, Zhi Su, Jiawen Tian, Yuyang Xiao, Shenyu Zhang, Liwei Zheng, Hang Li, Yonghui Wu

PDF

Open Access

TL;DR

GR-RL is a novel robotic learning framework that enhances generalist vision-language-action policies into specialized, high-precision manipulators capable of complex, long-horizon tasks like shoe-lacing, by filtering demonstrations and using reinforcement learning.

Contribution

It introduces a multi-stage training pipeline with demonstration filtering, augmentation, and reinforcement learning to improve dexterous manipulation performance.

Findings

01

Achieved 83.3% success in shoe-lacing task

02

Demonstrated improved generalization through morphological symmetry augmentation

03

Showed effectiveness of offline RL with sparse rewards for progress estimation

Abstract

We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting $Q$ -values can be treated as a robust progress function. Next, we introduce morphological symmetry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning