Adaptive Capacity Allocation for Vision Language Action Fine-tuning

Donghoon Kim; Minji Bae; Unghui Nam; Gyeonghun Kim; Suyun Lee; Kyuhong Shim; Byonghyo Shim

arXiv:2603.07404·cs.RO·March 10, 2026

Adaptive Capacity Allocation for Vision Language Action Fine-tuning

Donghoon Kim, Minji Bae, Unghui Nam, Gyeonghun Kim, Suyun Lee, Kyuhong Shim, Byonghyo Shim

PDF

Open Access

TL;DR

This paper introduces LoRA-SP, a rank-adaptive fine-tuning method for vision language action models that dynamically allocates capacity, leading to improved generalization and efficiency in robotic manipulation tasks.

Contribution

LoRA-SP adaptively allocates capacity during fine-tuning using an energy-based selection, outperforming fixed-rank methods in robotic vision language models.

Findings

01

LoRA-SP matches or exceeds full fine-tuning performance with fewer parameters.

02

It improves multi-task success rates by up to 31.6%.

03

The method is robust to rank choice and reduces cross-task interference.

Abstract

Vision language action models (VLAs) are increasingly used for Physical AI, but deploying a pre-trained VLA model to unseen environments, embodiments, or tasks still requires adaptation. Parameter-efficient fine-tuning (PEFT), especially LoRA, is common for VLA policies, yet the exposed capacity knob, the rank, does not transfer uniformly: robotics transfer exhibits a higher and task-varying intrinsic rank than language fine-tuning. Small ranks suffice for LLMs (e.g., $r \in {4, 8}$ ), while spectral analyses indicate VLAs may require much larger ranks (e.g., $r \approx 128$ ) or near-full rank, a mismatch that worsens in multi-task settings. We present LoRA-SP (Select-Prune), a rank-adaptive fine-tuning method that replaces fixed-rank updates with input- and layer-wise capacity. LoRA-SP uses an SVD-style parameterization with a small router whose nonnegative scores act as singular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning