Anything-3D: Towards Single-view Anything Reconstruction in the Wild

Qiuhong Shen; Xingyi Yang; Xinchao Wang

arXiv:2304.10261·cs.CV·April 21, 2023·34 cites

Anything-3D: Towards Single-view Anything Reconstruction in the Wild

Qiuhong Shen, Xingyi Yang, Xinchao Wang

PDF

Open Access 1 Repo

TL;DR

Anything-3D introduces a novel framework combining visual-language models and segmentation techniques to enable accurate single-view 3D object reconstruction in diverse real-world scenarios.

Contribution

The paper presents a new method that integrates multiple models for reliable single-view 3D reconstruction, addressing limitations of previous approaches.

Findings

01

Produces detailed 3D reconstructions for various objects

02

Demonstrates robustness across diverse datasets

03

Outperforms existing methods in accuracy

Abstract

3D reconstruction from a single-RGB image in unconstrained real-world scenarios presents numerous challenges due to the inherent diversity and complexity of objects and environments. In this paper, we introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model to elevate objects to 3D, yielding a reliable and versatile system for single-view conditioned 3D reconstruction task. Our approach employs a BLIP model to generate textural descriptions, utilizes the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field. Demonstrating its ability to produce accurate and detailed 3D reconstructions for a wide array of objects, \emph{Anything-3D\footnotemark[2]} shows promise in addressing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anything-of-anything/anything-3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsDiffusion · BLIP: Bootstrapping Language-Image Pre-training