CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning

Dongchi Huang; Zhirui Fang; Tianle Zhang; Yihang Li; Lin Zhao; Chunhe Xia

arXiv:2508.02219·cs.RO·August 5, 2025

CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning

Dongchi Huang, Zhirui Fang, Tianle Zhang, Yihang Li, Lin Zhao, Chunhe Xia

PDF

Open Access

TL;DR

This paper introduces CO-RFT, a novel offline reinforcement learning method with action chunking for fine-tuning vision-language-action models, significantly improving robotic control success rates and generalization with limited demonstration data.

Contribution

We propose Chunked RL and CO-RFT, novel algorithms that enhance sample efficiency and stability in fine-tuning VLA models for robotic tasks using offline RL with action chunking.

Findings

01

57% success rate improvement over previous methods

02

22.3% reduction in cycle time

03

44.3% success rate in unseen positions

Abstract

Vision-Language-Action (VLA) models demonstrate significant potential for developing generalized policies in real-world robotic control. This progress inspires researchers to explore fine-tuning these models with Reinforcement Learning (RL). However, fine-tuning VLA models with RL still faces challenges related to sample efficiency, compatibility with action chunking, and training stability. To address these challenges, we explore the fine-tuning of VLA models through offline reinforcement learning incorporating action chunking. In this work, we propose Chunked RL, a novel reinforcement learning framework specifically designed for VLA models. Within this framework, we extend temporal difference (TD) learning to incorporate action chunking, a prominent characteristic of VLA models. Building upon this framework, we propose CO-RFT, an algorithm aimed at fine-tuning VLA models using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning