TL;DR
This paper introduces LifeLong-RFT, a reinforcement fine-tuning strategy for vision-language-robotics models that improves continual learning and task adaptation without requiring online feedback or reward models.
Contribution
It proposes a novel reinforcement fine-tuning method with multi-dimensional rewards that enhances multi-task and continual learning for VLA models.
Findings
Achieves a 22% success rate improvement over supervised fine-tuning on LIBERO.
Effectively adapts to new tasks with only 20% of the training data.
Demonstrates strong performance across simulated and real-world tasks.
Abstract
Pretrained on large-scale and diverse datasets, VLA models demonstrate strong generalization and adaptability as general-purpose robotic policies. However, Supervised Fine-Tuning (SFT), which serves as the primary mechanism for adapting VLAs to downstream domains, requires substantial amounts of task-specific data and is prone to catastrophic forgetting. To address these limitations, we propose LifeLong-RFT, a simple yet effective Reinforcement Fine-Tuning (RFT) strategy for VLA models independent of online environmental feedback and pre-trained reward models. By integrating chunking-level on-policy reinforcement learning with the proposed multi-dimensional process reward mechanism, LifeLong-RFT quantifies the heterogeneous contributions of intermediate action chunks across three dimensions to facilitate policy optimization. Specifically, (1) the Quantized Action Consistency Reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
