Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation
Xuetao Li, Wenke Huang, Mang Ye, Jifeng Xuan, Bo Du, Sheng Liu, and Miao Li

TL;DR
This paper introduces RGMP-S, a novel framework combining geometric priors and spiking neural networks to improve scene understanding, reasoning, and data-efficient manipulation in humanoid robots, demonstrating superior generalization in diverse environments.
Contribution
The paper proposes a new recurrent geometric prior policy with spiking features that enhances high-level reasoning and motion synthesis for humanoid robot manipulation, with improved generalization and data efficiency.
Findings
Outperforms state-of-the-art methods in simulation and real-world tests.
Achieves robust generalization in unseen environments.
Enhances data efficiency in sparse demonstration scenarios.
Abstract
Humanoid robot manipulation is a crucial research area for executing diverse human-level tasks, involving high-level semantic reasoning and low-level action generation. However, precise scene understanding and sample-efficient learning from human demonstrations remain critical challenges, severely hindering the applicability and generalizability of existing frameworks. This paper presents a novel RGMP-S, Recurrent Geometric-prior Multimodal Policy with Spiking features, facilitating both high-level skill reasoning and data-efficient motion synthesis. To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases to enable precise 3D scene understanding within the vision-language model. Specifically, we construct a Long-horizon Geometric Prior Skill Selector that effectively aligns the semantic instructions with spatial constraints, ultimately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Motion and Animation · Human Pose and Action Recognition
