TL;DR
This paper introduces Fast-dVLA, a method that accelerates diffusion-based robot learning models to achieve near real-time performance by decoupling auxiliary training objectives and merging capability vectors.
Contribution
It proposes a novel parameter decoupling approach that enhances model capabilities efficiently, reducing computational costs during finetuning.
Findings
Achieves comparable performance to auxiliary finetuning with less computation.
Effectively improves robot task performance across diverse tasks.
Utilizes a lightweight regularization to enhance model capabilities.
Abstract
This paper proposes a novel approach to address the challenge that pretrained VLA models often fail to effectively improve performance and reduce adaptation costs during standard supervised finetuning (SFT). Some advanced finetuning methods with auxiliary training objectives can improve performance and reduce the number of convergence steps. However, they typically incur significant computational overhead due to the additional losses from auxiliary tasks. To simultaneously achieve the enhanced capabilities of auxiliary training with the simplicity of standard SFT, we decouple the two objectives of auxiliary task training within the parameter space, namely, enhancing general capabilities and fitting task-specific action distributions. To deliver this goal, we only need to train the model to converge on a small-scale task set using two distinct training strategies. The difference between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
