GR-3 Technical Report

Chilam Cheang; Sijin Chen; Zhongren Cui; Yingdong Hu; Liqun Huang; Tao Kong; Hang Li; Yifeng Li; Yuxiao Liu; Xiao Ma; Hao Niu; Wenxuan Ou; Wanli Peng; Zeyu Ren; Haixin Shi; Jiawen Tian; Hongtao Wu; Xin Xiao; Yuyang Xiao; Jiafeng Xu; Yichu Yang

arXiv:2507.15493·cs.RO·July 23, 2025

GR-3 Technical Report

Chilam Cheang, Sijin Chen, Zhongren Cui, Yingdong Hu, Liqun Huang, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Xiao Ma, Hao Niu, Wenxuan Ou, Wanli Peng, Zeyu Ren, Haixin Shi, Jiawen Tian, Hongtao Wu, Xin Xiao, Yuyang Xiao, Jiafeng Xu, Yichu Yang

PDF

TL;DR

GR-3 is a large-scale vision-language-action model that demonstrates strong generalization, efficient fine-tuning, and robust performance in complex robotic tasks, advancing towards versatile generalist robots for human assistance.

Contribution

The paper introduces GR-3, a novel large-scale VLA model with a comprehensive training recipe and a new bi-manual robot, ByteMini, enabling rapid adaptation and superior task performance.

Findings

01

GR-3 outperforms state-of-the-art baselines on various challenging tasks.

02

Efficient fine-tuning with minimal data is effective for new environments.

03

GR-3 handles long-horizon and dexterous tasks successfully.

Abstract

We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model. It showcases exceptional capabilities in generalizing to novel objects, environments, and instructions involving abstract concepts. Furthermore, it can be efficiently fine-tuned with minimal human trajectory data, enabling rapid and cost-effective adaptation to new settings. GR-3 also excels in handling long-horizon and dexterous tasks, including those requiring bi-manual manipulation and mobile movement, showcasing robust and reliable performance. These capabilities are achieved through a multi-faceted training recipe that includes co-training with web-scale vision-language data, efficient fine-tuning from human trajectory data collected via VR devices, and effective imitation learning with robot trajectory data. In addition, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.