SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models
Meng Li, Zhen Zhao, Zhengping Che, Fei Liao, Kun Wu, Zhiyuan Xu, Pei Ren, Zhao Jin, Ning Liu, Jian Tang

TL;DR
SwitchVLA introduces an execution-aware framework for vision-language-action models that enables robots to adapt to changing instructions mid-task, improving robustness and natural interaction in dynamic environments.
Contribution
It presents a novel behavior modulation approach for task switching in VLA models, without external planners or switch-specific data, based on execution state and instruction context.
Findings
Outperforms prior VLA models in success rate.
Enables fluid and reactive task switching.
Demonstrates robustness in real-world robotic tasks.
Abstract
Robots deployed in dynamic environments must be able to not only follow diverse language instructions but flexibly adapt when user intent changes mid-execution. While recent Vision-Language-Action (VLA) models have advanced multi-task learning and instruction following, they typically assume static task intent, failing to respond when new instructions arrive during ongoing execution. This limitation hinders natural and robust interaction in dynamic settings, such as retail or household environments, where real-time intent changes are common. We propose SwitchVLA, a unified, execution-aware framework that enables smooth and reactive task switching without external planners or additional switch-specific data. We model task switching as a behavior modulation problem conditioned on execution state and instruction context. Expert demonstrations are segmented into temporally grounded contact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
