OCH3R: Object-Centric Holistic 3D Reconstruction

Yi Du; Yang You; Xiang Wan; Leonidas Guibas

arXiv:2605.13018·cs.CV·May 14, 2026

OCH3R: Object-Centric Holistic 3D Reconstruction

Yi Du, Yang You, Xiang Wan, Leonidas Guibas

PDF

TL;DR

OCH3R is a unified, fast transformer-based framework that performs holistic 3D reconstruction and pose estimation of multiple objects from a single RGB image, outperforming traditional multi-stage methods.

Contribution

The paper introduces OCH3R, a novel transformer architecture that predicts object attributes and reconstructions in one pass, enabling scalable, accurate scene understanding from monocular images.

Findings

01

Achieves state-of-the-art results on indoor benchmarks.

02

Provides high-fidelity, editable 3D reconstructions.

03

Runs orders of magnitude faster than multi-stage pipelines.

Abstract

Object-centric scene understanding is a fundamental challenge in computer vision. Existing approaches often rely on multi-stage pipelines that first apply pre-trained segmentors to extract individual objects, followed by per-object 3D reconstruction. Such methods are computationally expensive, fragile to segmentation errors, and scale poorly with scene complexity. We introduce OCH3R, a unified framework for Object-Centric Holistic 3D Reconstruction from a single RGB image. OCH3R performs one forward pass to simultaneously predict all object instances with their 6D poses and detailed 3D reconstructions. The key idea is a transformer architecture that predicts per-pixel attributes, including CLIP-based category embeddings, metric depth, normalized object coordinates (NOCS), and a fixed number of 3D Gaussians representing each object. To supervise these Gaussian reconstructions, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.