VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making
Zuojin Tang, Bin Hu, Chenyang Zhao, De Ma, Gang Pan, Bin Liu

TL;DR
This paper introduces MIMO-VLA (VLASCD), a novel framework enabling simultaneous multi-task processing in multimodal models, overcoming limitations of traditional single-output architectures and improving performance in concurrent tasks like dialogue and decision-making.
Contribution
The paper proposes MIMO-VLA, a unified training framework that supports parallel multi-task outputs, inspired by human cognition, and demonstrates superior performance in MIMO scenarios.
Findings
MIMO-VLA outperforms state-of-the-art MISO models in MIMO tasks.
It enables concurrent dialogue generation and decision-making.
Experimental results on CARLA show significant performance gains.
Abstract
Recent large pretrained models such as LLMs (e.g., GPT series) and VLAs (e.g., OpenVLA) have achieved notable progress on multimodal tasks, yet they are built upon a multi-input single-output (MISO) paradigm. We show that this paradigm fundamentally limits performance in multi-input multi-output (MIMO) scenarios, where parallel task execution is required. In MISO architectures, tasks compete for a shared output channel, creating mutual exclusion effects that cause unbalanced optimization and degraded performance. To address this gap, we introduce MIMO-VLA (VLASCD), a unified training framework that enables concurrent multi-task outputs, exemplified by simultaneous dialogue generation and decision-making. Inspired by human cognition, MIMO-VLA eliminates interference between tasks and supports efficient parallel processing. Experiments on the CARLA autonomous driving platform demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and dialogue systems
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator
