Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation
Di Zhang, Weicheng Duan, Dasen Gu, Hongye Lu, Hai Zhang, Hang Yu, Junqiao Zhao, Guang Chen

TL;DR
This paper introduces a unified framework for robotic manipulation that learns 3D geometric representations from single-view data, enabling better generalization across viewpoints and improving success rates in diverse tasks.
Contribution
It proposes a novel single-view 3D pretraining method combined with policy distillation, addressing limitations of previous multi-view dependent approaches.
Findings
Outperforms previous methods by 12.7% in success rate on RLBench tasks.
Achieves strong zero-shot view generalization with minimal success rate drops.
Demonstrates effective transfer of 3D understanding to manipulation skills.
Abstract
Real-world robotic manipulation demands visuomotor policies capable of robust spatial scene understanding and strong generalization across diverse camera viewpoints. While recent advances in 3D-aware visual representations have shown promise, they still suffer from several key limitations, including reliance on multi-view observations during inference which is impractical in single-view restricted scenarios, incomplete scene modeling that fails to capture holistic and fine-grained geometric structures essential for precise manipulation, and lack of effective policy training strategies to retain and exploit the acquired 3D knowledge. To address these challenges, we present MethodName, a unified representation-policy learning framework for view-generalizable robotic manipulation. MethodName introduces a single-view 3D pretraining paradigm that leverages point cloud reconstruction and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
