OCH3R: Object-Centric Holistic 3D Reconstruction
Yi Du, Yang You, Xiang Wan, Leonidas Guibas

TL;DR
OCH3R is a unified, fast transformer-based framework that performs holistic 3D reconstruction and pose estimation of multiple objects from a single RGB image, outperforming traditional multi-stage methods.
Contribution
The paper introduces OCH3R, a novel transformer architecture that predicts object attributes and reconstructions in one pass, enabling scalable, accurate scene understanding from monocular images.
Findings
Achieves state-of-the-art results on indoor benchmarks.
Provides high-fidelity, editable 3D reconstructions.
Runs orders of magnitude faster than multi-stage pipelines.
Abstract
Object-centric scene understanding is a fundamental challenge in computer vision. Existing approaches often rely on multi-stage pipelines that first apply pre-trained segmentors to extract individual objects, followed by per-object 3D reconstruction. Such methods are computationally expensive, fragile to segmentation errors, and scale poorly with scene complexity. We introduce OCH3R, a unified framework for Object-Centric Holistic 3D Reconstruction from a single RGB image. OCH3R performs one forward pass to simultaneously predict all object instances with their 6D poses and detailed 3D reconstructions. The key idea is a transformer architecture that predicts per-pixel attributes, including CLIP-based category embeddings, metric depth, normalized object coordinates (NOCS), and a fixed number of 3D Gaussians representing each object. To supervise these Gaussian reconstructions, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
