ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency   Policy

Yuhui Chen; Shuai Tian; Shugao Liu; Yingting Zhou; Haoran Li; and; Dongbin Zhao

arXiv:2502.05450·cs.RO·April 15, 2025

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

Yuhui Chen, Shuai Tian, Shugao Liu, Yingting Zhou, Haoran Li, and, Dongbin Zhao

PDF

Open Access 1 Repo

TL;DR

ConRFT introduces a reinforcement learning-based fine-tuning method for VLA models that significantly improves success rates and efficiency in real-world robotic manipulation tasks by combining offline and online training with human interventions.

Contribution

The paper presents a novel reinforced fine-tuning approach, ConRFT, integrating behavior cloning, Q-learning, and consistency policy for robust VLA model adaptation in contact-rich environments.

Findings

01

Achieves 96.3% success rate on eight tasks

02

Outperforms supervised methods by 144% in success rate

03

Reduces episode length by 1.9 times

Abstract

Vision-Language-Action (VLA) models have shown substantial potential in real-world robotic manipulation. However, fine-tuning these models through supervised learning struggles to achieve robust performance due to limited, inconsistent demonstrations, especially in contact-rich environments. In this paper, we propose a reinforced fine-tuning approach for VLA models, named ConRFT, which consists of offline and online fine-tuning with a unified consistency-based training objective, to address these challenges. In the offline stage, our method integrates behavior cloning and Q-learning to effectively extract policy from a small set of demonstrations and stabilize value estimating. In the online stage, the VLA model is further fine-tuned via consistency policy, with human interventions to ensure safe exploration and high sample efficiency. We evaluate our approach on eight diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cccedric/conrft
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques

MethodsQ-Learning · Sparse Evolutionary Training