Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Haoming Song, Delin Qu, Yuanqi Yao, Qizhi Chen, Qi Lv, Yiwen Tang, Modi Shi, Guanghui Ren, Maoqing Yao, Bin Zhao, Dong Wang, Xuelong Li

TL;DR
Hume introduces a dual-system approach combining slow, value-guided reasoning with fast reactive control in a vision-language-action model to enhance dexterous robot manipulation in physical environments.
Contribution
The paper presents Hume, a novel dual-system VLA model incorporating System-2 value-guided thinking and cascaded action denoising for improved robotic dexterity.
Findings
Hume outperforms existing models in simulation benchmarks.
Hume achieves superior real-robot dexterous control.
System-2 thinking enhances decision accuracy in complex tasks.
Abstract
Humans practice slow thinking before performing actual actions when handling complex tasks in the physical world. This thinking paradigm, recently, has achieved remarkable advancement in boosting Large Language Models (LLMs) to solve complex tasks in digital domains. However, the potential of slow thinking remains largely unexplored for robotic foundation models interacting with the physical world. In this work, we propose Hume: a dual-system Vision-Language-Action (VLA) model with value-guided System-2 thinking and cascaded action denoising, exploring human-like thinking capabilities of Vision-Language-Action models for dexterous robot control. System 2 of Hume implements value-Guided thinking by extending a Vision-Language-Action Model backbone with a novel value-query head to estimate the state-action value of predicted actions. The value-guided thinking is conducted by repeat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSystems Engineering Methodologies and Applications · Complex Systems and Decision Making
