ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Yuhui Chen, Shuai Tian, Shugao Liu, Yingting Zhou, Haoran Li, and, Dongbin Zhao

TL;DR
ConRFT introduces a reinforcement learning-based fine-tuning method for VLA models that significantly improves success rates and efficiency in real-world robotic manipulation tasks by combining offline and online training with human interventions.
Contribution
The paper presents a novel reinforced fine-tuning approach, ConRFT, integrating behavior cloning, Q-learning, and consistency policy for robust VLA model adaptation in contact-rich environments.
Findings
Achieves 96.3% success rate on eight tasks
Outperforms supervised methods by 144% in success rate
Reduces episode length by 1.9 times
Abstract
Vision-Language-Action (VLA) models have shown substantial potential in real-world robotic manipulation. However, fine-tuning these models through supervised learning struggles to achieve robust performance due to limited, inconsistent demonstrations, especially in contact-rich environments. In this paper, we propose a reinforced fine-tuning approach for VLA models, named ConRFT, which consists of offline and online fine-tuning with a unified consistency-based training objective, to address these challenges. In the offline stage, our method integrates behavior cloning and Q-learning to effectively extract policy from a small set of demonstrations and stabilize value estimating. In the online stage, the VLA model is further fine-tuned via consistency policy, with human interventions to ensure safe exploration and high sample efficiency. We evaluate our approach on eight diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques
MethodsQ-Learning · Sparse Evolutionary Training
