Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training
Yi Liu, Sukai Wang, Dafeng Wei, Xiaowei Cai, Linqing Zhong, Jiange Yang, Guanghui Ren, Jinyu Zhang, Maoqing Yao, Chuankang Li, Xindong He, Liliang Chen, Jianlan Luo

TL;DR
This paper introduces ERIQ, a benchmark for embodied reasoning in robotics, and FACT, a discrete action tokenizer, to improve the integration of reasoning and precise control in robotic manipulation.
Contribution
It presents ERIQ for systematic evaluation of embodied reasoning and proposes FACT to bridge reasoning and control, enabling better robotic manipulation performance.
Findings
ERIQ reveals a strong correlation between reasoning and generalization.
FACT improves trajectory fidelity in discrete control sequences.
GenieReasoner outperforms prior methods in real-world robotic tasks.
Abstract
General-purpose robotic systems operating in open-world environments must achieve both broad generalization and high-precision action execution, a combination that remains challenging for existing Vision-Language-Action (VLA) models. While large Vision-Language Models (VLMs) improve semantic generalization, insufficient embodied reasoning leads to brittle behavior, and conversely, strong reasoning alone is inadequate without precise control. To provide a decoupled and quantitative assessment of this bottleneck, we introduce Embodied Reasoning Intelligence Quotient (ERIQ), a large-scale embodied reasoning benchmark in robotic manipulation, comprising 6K+ question-answer pairs across four reasoning dimensions. By decoupling reasoning from execution, ERIQ enables systematic evaluation and reveals a strong positive correlation between embodied reasoning capability and end-to-end VLA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
