Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

Sizhe Yang; Linning Xu; Hao Li; Juncheng Mu; Jia Zeng; Dahua Lin; Jiangmiao Pang

arXiv:2602.10101·cs.RO·May 5, 2026

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

Sizhe Yang, Linning Xu, Hao Li, Juncheng Mu, Jia Zeng, Dahua Lin, Jiangmiao Pang

PDF

TL;DR

Robo3R is a real-time, feed-forward 3D reconstruction model that predicts accurate, metric-scale scene geometry from RGB images and robot states, improving robotic manipulation tasks.

Contribution

Introduces Robo3R, a novel 3D reconstruction approach that combines local geometry inference and camera pose refinement for manipulation-ready scene understanding.

Findings

01

Outperforms state-of-the-art reconstruction methods and depth sensors.

02

Enhances downstream tasks like grasp synthesis and motion planning.

03

Trained on a large synthetic dataset with 4 million frames.

Abstract

3D spatial perception is fundamental to generalizable robotic manipulation, yet obtaining reliable, high-quality 3D geometry remains challenging. Depth sensors suffer from noise and material sensitivity, while existing reconstruction models lack the precision and metric consistency required for physical interaction. We introduce Robo3R, a feed-forward, manipulation-ready 3D reconstruction model that predicts accurate, metric-scale scene geometry directly from RGB images and robot states in real time. Robo3R jointly infers scale-invariant local geometry and relative camera poses, which are unified into the scene representation in the canonical robot frame via a learned global similarity transformation. To meet the precision demands of manipulation, Robo3R employs a masked point head for sharp, fine-grained point clouds, and a keypoint-based Perspective-n-Point (PnP) formulation to refine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.