UniPR: Unified Object-level Real-to-Sim Perception and Reconstruction from a Single Stereo Pair
Chuanrui Zhang, Yingshuang Zou, ZhengXian Wu, Yonggen Ling, Yuxiao Yang, Ziwei Wang

TL;DR
UniPR is an end-to-end framework that reconstructs and perceives objects from a single stereo image pair, improving efficiency and accuracy in real-to-sim robotic perception tasks by leveraging geometric constraints and novel shape representations.
Contribution
It introduces UniPR, the first unified end-to-end object-level perception and reconstruction method from a single stereo pair, and creates LVS6D, a large-scale stereo dataset for research.
Findings
Reconstructs all scene objects in a single forward pass
Achieves significant efficiency improvements over modular pipelines
Preserves true physical proportions across diverse objects
Abstract
Perceiving and reconstructing objects from images are critical for real-to-sim transfer tasks, which are widely used in the robotics community. Existing methods rely on multiple submodules such as detection, segmentation, shape reconstruction, and pose estimation to complete the pipeline. However, such modular pipelines suffer from inefficiency and cumulative error, as each stage operates on only partial or locally refined information while discarding global context. To address these limitations, we propose UniPR, the first end-to-end object-level real-to-sim perception and reconstruction framework. Operating directly on a single stereo image pair, UniPR leverages geometric constraints to resolve the scale ambiguity. We introduce Pose-Aware Shape Representation to eliminate the need for per-category canonical definitions and to bridge the gap between reconstruction and pose estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Robot Manipulation and Learning
