Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation

Di Zhang; Weicheng Duan; Dasen Gu; Hongye Lu; Hai Zhang; Hang Yu; Junqiao Zhao; Guang Chen

arXiv:2601.22988·cs.RO·February 2, 2026

Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation

Di Zhang, Weicheng Duan, Dasen Gu, Hongye Lu, Hai Zhang, Hang Yu, Junqiao Zhao, Guang Chen

PDF

Open Access

TL;DR

This paper introduces a unified framework for robotic manipulation that learns 3D geometric representations from single-view data, enabling better generalization across viewpoints and improving success rates in diverse tasks.

Contribution

It proposes a novel single-view 3D pretraining method combined with policy distillation, addressing limitations of previous multi-view dependent approaches.

Findings

01

Outperforms previous methods by 12.7% in success rate on RLBench tasks.

02

Achieves strong zero-shot view generalization with minimal success rate drops.

03

Demonstrates effective transfer of 3D understanding to manipulation skills.

Abstract

Real-world robotic manipulation demands visuomotor policies capable of robust spatial scene understanding and strong generalization across diverse camera viewpoints. While recent advances in 3D-aware visual representations have shown promise, they still suffer from several key limitations, including reliance on multi-view observations during inference which is impractical in single-view restricted scenarios, incomplete scene modeling that fails to capture holistic and fine-grained geometric structures essential for precise manipulation, and lack of effective policy training strategies to retain and exploit the acquired 3D knowledge. To address these challenges, we present MethodName, a unified representation-policy learning framework for view-generalizable robotic manipulation. MethodName introduces a single-view 3D pretraining paradigm that leverages point cloud reconstruction and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Vision and Imaging · 3D Shape Modeling and Analysis