SAM 3D: 3Dfy Anything in Images

SAM 3D Team; Xingyu Chen; Fu-Jen Chu; Pierre Gleize; Kevin J Liang; Alexander Sax; Hao Tang; Weiyao Wang; Michelle Guo; Thibaut Hardin; Xiang Li; Aohan Lin; Jiawei Liu; Ziqi Ma; Anushka Sagar; Bowen Song; Xiaodong Wang; Jianing Yang; Bowen Zhang; Piotr Doll\'ar; Georgia Gkioxari; Matt Feiszli; Jitendra Malik

arXiv:2511.16624·cs.CV·November 21, 2025

SAM 3D: 3Dfy Anything in Images

SAM 3D Team, Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, Aohan Lin, Jiawei Liu, Ziqi Ma, Anushka Sagar, Bowen Song, Xiaodong Wang, Jianing Yang, Bowen Zhang, Piotr Doll\'ar, Georgia Gkioxari

PDF

Open Access 4 Models

TL;DR

SAM 3D introduces a scalable, multi-stage framework for 3D object reconstruction from a single image, leveraging human-in-the-loop annotations and synthetic pretraining to outperform recent methods.

Contribution

The paper presents SAM 3D, a novel generative model that combines human and model-in-the-loop annotation with a multi-stage training process to improve 3D reconstruction from images.

Findings

01

Achieves at least a 5:1 preference win over recent methods.

02

Provides a new large-scale dataset for in-the-wild 3D reconstruction.

03

Outperforms prior work in natural, cluttered scenes.

Abstract

We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis