MTFusion: Reconstructing Any 3D Object from Single Image Using   Multi-word Textual Inversion

Yu Liu; Ruowei Wang; Jiaqi Li; Zixiang Xu; Qijun Zhao

arXiv:2411.12197·cs.CV·November 20, 2024

MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion

Yu Liu, Ruowei Wang, Jiaqi Li, Zixiang Xu, Qijun Zhao

PDF

TL;DR

MTFusion introduces a novel method combining multi-word textual inversion and image data to reconstruct detailed 3D models from a single image, surpassing existing techniques in fidelity and speed.

Contribution

The paper presents a new multi-word textual inversion technique and an enhanced 3D generation pipeline using FlexiCubes, improving detail capture and training efficiency.

Findings

01

Outperforms existing methods on synthetic and real images

02

Achieves higher fidelity in surface and texture details

03

Faster training due to improved decoder network

Abstract

Reconstructing 3D models from single-view images is a long-standing problem in computer vision. The latest advances for single-image 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models. However, existing methods focus on capturing a single key attribute of the image (e.g., object type, artistic style) and fail to consider the multi-perspective information required for accurate 3D reconstruction, such as object shape and material properties. Besides, the reliance on Neural Radiance Fields hinders their ability to reconstruct intricate surfaces and texture details. In this work, we propose MTFusion, which leverages both image data and textual descriptions for high-fidelity 3D reconstruction. Our approach consists of two stages. First, we adopt a novel multi-word textual inversion technique to extract a detailed text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsADaptive gradient method with the OPTimal convergence rate · Focus