DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

Zhen Fang; Zhuoyang Liu; Jiaming Liu; Hao Chen; Yu Zeng; Shiting Huang; Zehui Chen; Lin Chen; Shanghang Zhang; Feng Zhao

arXiv:2511.22134·cs.CV·December 1, 2025

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

Zhen Fang, Zhuoyang Liu, Jiaming Liu, Hao Chen, Yu Zeng, Shiting Huang, Zehui Chen, Lin Chen, Shanghang Zhang, Feng Zhao

PDF

Open Access

TL;DR

DualVLA introduces a novel approach to improve generalizable embodied vision-language-action models by decoupling reasoning and action, employing data pruning and adaptive distillation to enhance performance without sacrificing reasoning capabilities.

Contribution

The paper proposes DualVLA, a method that enhances action performance in VLA models through data pruning and adaptive distillation, while maintaining reasoning ability, addressing the action degeneration problem.

Findings

01

Achieves 61.0 success rate in SimplerEnv

02

Scores 65.4 on average across eight benchmarks

03

Balances precise action execution with multimodal understanding

Abstract

To build a generalizable Vision-Language-Action (VLA) model with strong reasoning ability, a common strategy is to first train a specialist VLA on robot demonstrations to acquire reliable manipulation skills, and then incorporate mixed annotated robot data together with multimodal data to restore broader reasoning capabilities. However, we observe that the resulting reasoning VLA often suffers from degraded action performance compared to the specialist model before fine-tuning, a phenomenon we refer to as action degeneration. To address this issue, we propose DualVLA, which enhances action performance through carefully designed post-training while still preserving reasoning capability. We first introduce a dual-layer data pruning method that removes redundant embodied reasoning, preventing it from adversely influencing action learning. To further strengthen action generation, we design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning