TL;DR
This paper introduces a multimodal Transformer-based system for humanoid robot manipulation that leverages touch, vision, and proprioception, achieving significant success in contact-rich tasks.
Contribution
It presents a novel touch dreaming approach with a Transformer model trained via behavioral cloning, enhancing dexterity and contact-awareness in humanoid robots.
Findings
HTD achieves 90.9% success rate across five tasks.
Latent-space tactile prediction outperforms raw tactile prediction.
The system enables versatile, high-dexterity manipulation in real-world scenarios.
Abstract
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, end-effector dexterity, and contact-aware interaction under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based lower-body controller that serves as the stability backbone for whole-body execution during complex manipulation. Built on this controller, we develop a VR-based whole-body humanoid data collection system that integrates dexterous hands and tactile sensing for contact-rich manipulation. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
