Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, Junli Wang, Tao Xu, Jing Wu, Jianhua Wu, Xiaoshuai Hao, Wen Zhang, Tianyi Jiang, Lingfeng Zhang, Lei Zhou, Yingbo Tang, Jie Wang

TL;DR
OneVL introduces a unified latent reasoning framework for autonomous driving that internalizes causal dynamics, enabling faster inference and surpassing explicit chain-of-thought methods in accuracy.
Contribution
It presents a novel latent CoT approach with a visual world model decoder, improving speed and accuracy over explicit reasoning methods in autonomous driving tasks.
Findings
OneVL outperforms explicit CoT on four benchmarks.
It achieves answer-only latency comparable to direct prediction.
Latent CoT with world model supervision yields more generalizable representations.
Abstract
Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, but consistently fall short of their explicit counterparts. We suggest that this is due to purely linguistic latent representations compressing a symbolic abstraction of the world, rather than the causal dynamics that actually govern driving. Thus, we present OneVL (One-step latent reasoning and planning with Vision-Language explanations), a unified VLA and World Model framework that routes reasoning through compact latent tokens supervised by dual auxiliary decoders. Alongside a language decoder that reconstructs text CoT, we introduce a visual world model decoder that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗xiaomi-research/OneVL_AlpamayoR1model· 74 dl74 dl
- 🤗xiaomi-research/OneVL_Impromptumodel· 49 dl49 dl
- 🤗xiaomi-research/OneVL_NAVSIMmodel· 154 dl154 dl
- 🤗xiaomi-research/OneVL_ROADWorkmodel· 56 dl56 dl
- 🤗xiaomi-research/OneVL_visual_decoder_ptmodel· 68 dl68 dl
- 🤗xiaomi-research/OneVL_visual_decoder_pt_ar1model· 57 dl57 dl
- 🤗Fukangjia/OneVL_AlpamayoR1model· 17 dl17 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
