Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models

Shuanghao Bai; Jing Lyu; Wanqi Zhou; Zhe Li; Dakai Wang; Lei Xing; Xiaoguang Zhao; Pengwei Wang; Zhongyuan Wang; Cheng Chi; Badong Chen; Shanghang Zhang

arXiv:2602.01166·cs.RO·May 11, 2026

Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models

Shuanghao Bai, Jing Lyu, Wanqi Zhou, Zhe Li, Dakai Wang, Lei Xing, Xiaoguang Zhao, Pengwei Wang, Zhongyuan Wang, Cheng Chi, Badong Chen, Shanghang Zhang

PDF

1 Repo

TL;DR

LaRA-VLA introduces a continuous latent reasoning framework for vision-language-action models, reducing inference latency and improving performance in embodied tasks by internalizing multi-modal reasoning.

Contribution

It proposes a unified latent reasoning approach that replaces explicit chain-of-thought generation, with a curriculum-based training paradigm for efficient real-time embodied control.

Findings

01

Outperforms state-of-the-art VLA methods on benchmarks

02

Reduces inference latency by up to 90%

03

Effective in long-horizon real-robot manipulation tasks

Abstract

Vision-Language-Action (VLA) models benefit from chain-of-thought (CoT) reasoning, but existing approaches incur high inference overhead and rely on discrete reasoning representations that mismatch continuous perception and control. We propose Latent Reasoning VLA (LaRA-VLA), a unified VLA framework that internalizes multi-modal CoT reasoning into continuous latent representations for embodied action. LaRA-VLA performs unified reasoning and prediction in latent space, eliminating explicit CoT generation at inference time and enabling efficient, action-oriented control. To realize latent embodied reasoning, we introduce a curriculum-based training paradigm that progressively transitions from explicit textual and visual CoT supervision to latent reasoning, and finally adapts latent reasoning dynamics to condition action generation. We construct two structured CoT datasets and evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://loveju1y.github.io/Latent-Reasoning-VLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.