TL;DR
This paper presents an optimized fine-tuning approach for vision-language-action models that significantly improves their efficiency, success rates, and flexibility in robotic tasks, demonstrated through state-of-the-art results in simulation and real-world experiments.
Contribution
The paper introduces a comprehensive fine-tuning recipe for VLAs, including parallel decoding, action chunking, continuous action representation, and L1 regression, leading to OpenVLA-OFT with superior performance.
Findings
Achieved 97.1% success rate on LIBERO benchmark, up from 76.5%.
Increased action generation throughput by 26 times.
Enabled high-frequency dexterous control on a bimanual robot.
Abstract
Recent vision-language-action models (VLAs) build upon pretrained vision-language models and leverage diverse robot datasets to demonstrate strong task execution, language following ability, and semantic generalization. Despite these successes, VLAs struggle with novel robot setups and require fine-tuning to achieve good performance, yet how to most effectively fine-tune them is unclear given many possible strategies. In this work, we study key VLA adaptation design choices such as different action decoding schemes, action representations, and learning objectives for fine-tuning, using OpenVLA as our representative base model. Our empirical analysis informs an Optimized Fine-Tuning (OFT) recipe that integrates parallel decoding, action chunking, a continuous action representation, and a simple L1 regression-based learning objective to altogether improve inference efficiency, policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗moojink/openvla-7b-oft-finetuned-libero-spatialmodel· 6.4k dl· ♡ 146.4k dl♡ 14
- 🤗moojink/openvla-7b-oft-finetuned-libero-objectmodel· 1.3k dl· ♡ 21.3k dl♡ 2
- 🤗moojink/openvla-7b-oft-finetuned-libero-goalmodel· 1.9k dl· ♡ 11.9k dl♡ 1
- 🤗moojink/openvla-7b-oft-finetuned-libero-10model· 1.3k dl· ♡ 31.3k dl♡ 3
- 🤗moojink/openvla-7b-oft-finetuned-libero-spatial-object-goal-10model· 902 dl· ♡ 9902 dl♡ 9
- 🤗iMihayo/simvla_conditionmodel
- 🤗iMihayo/simvla_twin5model
- 🤗Nirav-Madhani/vla-adapter-gr00t-g1-bridgeattentionmodel· 2 dl2 dl
- 🤗Dexmal/simpler-db-oftmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗Dexmal/calvin-db-oftmodel· 2 dl· ♡ 12 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBalanced Selection
