SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse
Xuanran Zhai, Zekai Huang, Longyan Wu, Qianyou Zhao, Qiaojun Yu, Jieji Ren, Ce Hao, Harold Soh

TL;DR
SkillVLA introduces a framework for dual-arm manipulation that enables skill reuse, significantly improving the ability to recombine learned skills for complex tasks and outperforming existing models in success rate.
Contribution
The paper presents SkillVLA, a novel framework that explicitly supports skill reuse in dual-arm manipulation, addressing the challenge of combinatorial diversity in vision-language-action models.
Findings
Success rate increased from 0% to 51%.
Improved skill composition for complex tasks.
Strong performance on cooperative and long-horizon tasks.
Abstract
Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
