Hyperbolic Multiview Pretraining for Robotic Manipulation

Jin Yang; Ping Wei; Yixin Chen; Nanning Zheng

arXiv:2603.04848·cs.RO·March 13, 2026

Hyperbolic Multiview Pretraining for Robotic Manipulation

Jin Yang, Ping Wei, Yixin Chen, Nanning Zheng

PDF

Open Access

TL;DR

This paper introduces HyperMVP, a hyperbolic space-based self-supervised pretraining framework for robotic manipulation, demonstrating improved robustness and generalization over Euclidean-based methods across multiple datasets and real-world tasks.

Contribution

The paper proposes HyperMVP, a novel hyperbolic multiview pretraining method with a GeoLink encoder, and introduces the 3D-MOV dataset for enhanced 3D-aware pretraining in robotics.

Findings

01

HyperMVP outperforms Euclidean baselines on multiple benchmarks.

02

Hyperbolic embeddings better capture structural relations in 3D data.

03

Pretraining improves robustness and generalization in manipulation tasks.

Abstract

3D-aware visual pretraining has proven effective in improving the performance of downstream robotic manipulation tasks. However, existing methods are constrained to Euclidean embedding spaces, whose flat geometry limits their ability to model structural relations among embeddings. As a result, they struggle to learn structured embeddings that are essential for robust spatial perception in robotic applications. To this end, we propose HyperMVP, a self-supervised framework for \underline{Hyper}bolic \underline{M}ulti\underline{V}iew \underline{P}retraining. Hyperbolic space offers geometric properties well suited for capturing structural relations. Methodologically, we extend the masked autoencoder paradigm and design a GeoLink encoder to learn multiview hyperbolic representations. The pretrained encoder is then finetuned with visuomotor policies on manipulation tasks. In addition, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · 3D Shape Modeling and Analysis · Advanced Vision and Imaging