PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
Xinyu Guo, Bin Xie, Wei Chai, Xianchi Deng, Tiancai Wang, Zhengxing Wu, Xingyu Chen

TL;DR
PriorVLA introduces a method that preserves pretrained priors while adapting vision-language-action models for robot tasks, achieving better performance especially in out-of-distribution and few-shot scenarios.
Contribution
It proposes a novel framework that keeps a frozen prior source and trains a small adaptation module, improving adaptation efficiency and effectiveness over full fine-tuning.
Findings
PriorVLA outperforms full fine-tuning and state-of-the-art baselines.
Achieves 99.1% success on LIBERO benchmark.
Significant gains in out-of-distribution and few-shot settings.
Abstract
Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the common practice of full fine-tuning treats pretraining as initialization and can shift broad priors toward narrow training-distribution patterns. We propose PriorVLA, a novel framework that preserves pretrained priors and learns to leverage them for effective adaptation. PriorVLA keeps a frozen Prior Expert as a read-only prior source and trains an Adaptation Expert for downstream specialization. Expert Queries capture scene priors from the pretrained VLM and motor priors from the Prior Expert, integrating both into the Adaptation Expert to guide adaptation. Together, PriorVLA updates only 25% of the parameters updated by full fine-tuning. Across RoboTwin 2.0, LIBERO, and real-world tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
