Humanoid Whole-Body Manipulation via Active Spatial Brain and Generalizable Action Cerebellum
Zhizhao Liang, Yi-Lin Wei, Xuhang Chen, Mu Lin, Yi-Xiang He, Zhexi Luo, Jun-Hui Liu, Kun-Yu Lin, Wei-Shi Zheng

TL;DR
This paper introduces a novel humanoid manipulation framework utilizing large models for active spatial perception and generalizable action generation, addressing challenges in complex 3D environments.
Contribution
It presents a new framework with Active Spatial Brain and Action Cerebellum components that improve spatial understanding and action generalization without extensive real-robot data.
Findings
Strong performance in spatial perception and understanding tasks.
Effective real-robot task execution across diverse environments.
Benchmark results demonstrate robustness and generalization.
Abstract
In this paper, we explore spatial-aware humanoid whole-body manipulation task. Compared with tabletop settings, this task poses two key challenges: 1) Spatial understanding is challenging in complex 3D environments with diverse spatial relations. 2) Action generation is difficult to generalize, as limited and costly real-robot data restricts data-driven models generalization. To address these challenges, we propose a generalizable humanoid loco-manipulation framework that leverages the spatial perception and action generation capabilities of multi-agent large models. Specifically, our framework includes two components: Active Spatial Brain for active spatial perception and decision-making, and Generalizable Action Cerebellum for executable robot action generation. The first component actively perceives the spatial scene and makes decisions on task planning and subtask decomposition. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
