StreamVLA: Breaking the Reason-Act Cycle via Completion-State Gating
Tongqing Chen, Hang Wu, Jiasen Wang, Xiaotao Li, Lu Fang

TL;DR
StreamVLA introduces a dual-system architecture with a gating mechanism that reduces reasoning redundancy, improves robustness, and significantly cuts inference latency in long-horizon robotic manipulation tasks.
Contribution
The paper proposes StreamVLA, a novel architecture that unifies high-level planning and low-level control with a completion-state gating mechanism, enhancing efficiency and robustness.
Findings
Achieves 98.5% success on LIBERO benchmark.
Reduces inference latency by 48% compared to baselines.
Demonstrates robust recovery in real-world interference scenarios.
Abstract
Long-horizon robotic manipulation requires bridging the gap between high-level planning (System 2) and low-level control (System 1). Current Vision-Language-Action (VLA) models often entangle these processes, performing redundant multimodal reasoning at every timestep, which leads to high latency and goal instability. To address this, we present StreamVLA, a dual-system architecture that unifies textual task decomposition, visual goal imagination, and continuous action generation within a single parameter-efficient backbone. We introduce a "Lock-and-Gated" mechanism to intelligently modulate computation: only when a sub-task transition is detected, the model triggers slow thinking to generate a textual instruction and imagines the specific visual completion state, rather than generic future frames. Crucially, this completion state serves as a time-invariant goal anchor, making the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics
