Hyperbolic Multiview Pretraining for Robotic Manipulation
Jin Yang, Ping Wei, Yixin Chen, Nanning Zheng

TL;DR
This paper introduces HyperMVP, a hyperbolic space-based self-supervised pretraining framework for robotic manipulation, demonstrating improved robustness and generalization over Euclidean-based methods across multiple datasets and real-world tasks.
Contribution
The paper proposes HyperMVP, a novel hyperbolic multiview pretraining method with a GeoLink encoder, and introduces the 3D-MOV dataset for enhanced 3D-aware pretraining in robotics.
Findings
HyperMVP outperforms Euclidean baselines on multiple benchmarks.
Hyperbolic embeddings better capture structural relations in 3D data.
Pretraining improves robustness and generalization in manipulation tasks.
Abstract
3D-aware visual pretraining has proven effective in improving the performance of downstream robotic manipulation tasks. However, existing methods are constrained to Euclidean embedding spaces, whose flat geometry limits their ability to model structural relations among embeddings. As a result, they struggle to learn structured embeddings that are essential for robust spatial perception in robotic applications. To this end, we propose HyperMVP, a self-supervised framework for \underline{Hyper}bolic \underline{M}ulti\underline{V}iew \underline{P}retraining. Hyperbolic space offers geometric properties well suited for capturing structural relations. Methodologically, we extend the masked autoencoder paradigm and design a GeoLink encoder to learn multiview hyperbolic representations. The pretrained encoder is then finetuned with visuomotor policies on manipulation tasks. In addition, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
