VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

Yuhua Jiang; Junjie Lu; Xinyao Qin; Xiaoyu Chen; Kaixin Wang; Feifei Gao; Li Zhao

arXiv:2605.06175·cs.RO·May 11, 2026

VLA-GSE: Boosting Parameter-Efficient Fine-Tuning in VLA with Generalized and Specialized Experts

Yuhua Jiang, Junjie Lu, Xinyao Qin, Xiaoyu Chen, Kaixin Wang, Feifei Gao, Li Zhao

PDF

1 Repo

TL;DR

VLA-GSE introduces a novel parameter-efficient fine-tuning framework for vision-language-action models, enhancing robotic control adaptation while preserving pre-trained knowledge, with significant improvements demonstrated across benchmarks.

Contribution

It proposes a spectral decomposition-based expert routing method that improves adaptation capacity under fixed parameter budgets in VLA models.

Findings

01

VLA-GSE updates only 2.51% of parameters and outperforms FFT and PEFT baselines.

02

Achieves 81.2% average zero-shot success on LIBERO-Plus.

03

Preserves pre-trained VLM capabilities comparable to LoRA.

Abstract

Vision-language-action (VLA) models inherit rich visual-semantic priors from pre-trained vision-language backbones, but adapting them to robotic control remains challenging. Full fine-tuning (FFT) is prone to overfitting on downstream robotic data and catastrophic forgetting of pretrained vision-language capabilities. Parameter-efficient fine-tuning (PEFT) better preserves pre-trained knowledge, yet existing PEFT methods still struggle to adapt effectively to robot control tasks. To address this gap, we propose VLA-GSE, a parameter-efficient VLA fine-tuning framework that improves control adaptation while retaining PEFT's knowledge preservation advantage. Specifically, VLA-GSE (Generalized and Specialized Experts) is initialized by spectrally decomposing the frozen backbone, assigning leading singular components to generalized experts (shared experts) and disjoint residual components to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YuhuaJiang2002/VLA-GSE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.