IG-RFT: An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation
Zhian Su, Weijie Kong, Haonan Dong, Huixu Dong

TL;DR
This paper introduces IG-RFT, a reinforcement learning framework that enhances vision-language-action models for long-horizon robotic tasks by improving exploration, stability, and reward design, leading to significant performance gains.
Contribution
The paper presents a novel interaction-guided RL system with a new algorithm and hybrid reward shaping, enabling effective real-world fine-tuning of VLA models for complex manipulation tasks.
Findings
Achieved an average success rate of 85.0% on long-horizon tasks.
Outperformed SFT and standard Offline RL baselines significantly.
Validated the effectiveness of IG-AWR and hybrid rewards through ablation studies.
Abstract
Vision-Language-Action (VLA) models have demonstrated significant potential for generalist robotic policies; however, they struggle to generalize to long-horizon complex tasks in novel real-world domains due to distribution shifts and the scarcity of high-quality demonstrations. Although reinforcement learning (RL) offers a promising avenue for policy improvement, applying it to real-world VLA fine-tuning faces challenges regarding exploration efficiency, training stability, and sample cost. To address these issues, we propose IG-RFT, a novel Interaction-Guided Reinforced Fine-Tuning system designed for flow-based VLA models. Firstly, to facilitate effective policy optimization, we introduce Interaction-Guided Advantage Weighted Regression (IG-AWR), an RL algorithm that dynamically modulates exploration intensity based on the robot's interaction status. Furthermore, to address the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning
