ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang; Zifeng Zhuang; Han Zhao; Pengxiang Ding; Hongchao Lu; Donglin Wang

arXiv:2505.07395·cs.RO·May 13, 2025

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao, Pengxiang Ding, Hongchao Lu, Donglin Wang

PDF

Open Access

TL;DR

ReinboT is a novel vision-language-action model for robots that leverages reinforcement learning to improve decision-making, robustness, and generalization in manipulation tasks, especially with mixed-quality training data.

Contribution

The paper introduces ReinboT, integrating RL with VLA models to enhance data understanding and decision robustness in robotic manipulation.

Findings

01

Achieves state-of-the-art results on CALVIN dataset

02

Exhibits superior few-shot learning capabilities

03

Shows improved out-of-distribution generalization

Abstract

Vision-Language-Action (VLA) models have shown great potential in general robotic decision-making tasks via imitation learning. However, the variable quality of training data often constrains the performance of these models. On the other hand, offline Reinforcement Learning (RL) excels at learning robust policy models from mixed-quality data. In this paper, we introduce Reinforced robot GPT (ReinboT), a novel end-to-end VLA model that integrates the RL principle of maximizing cumulative reward. ReinboT achieves a deeper understanding of the data quality distribution by predicting dense returns that capture the nuances of manipulation tasks. The dense return prediction capability enables the robot to generate more robust decision-making actions, oriented towards maximizing future benefits. Extensive experiments show that ReinboT achieves state-of-the-art performance on the CALVIN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning