Chasing Consistency in Text-to-3D Generation from a Single Image

Yichen Ouyang; Wenhao Chai; Jiayi Ye; Dapeng Tao; Yibing Zhan; Gaoang; Wang

arXiv:2309.03599·cs.CV·September 8, 2023·2 cites

Chasing Consistency in Text-to-3D Generation from a Single Image

Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang, Wang

PDF

Open Access

TL;DR

Consist3D is a three-stage framework that improves the consistency and realism of text-to-3D generation from a single image by addressing semantic, geometric, and saturation issues.

Contribution

It introduces a novel three-stage process with consistency tokens for semantic, geometric, and saturation control, enhancing 3D generation quality from a single view.

Findings

01

Produces more consistent and photo-realistic 3D assets

02

Reduces overfitting and saturation issues

03

Enables background and object editing via text prompts

Abstract

Text-to-3D generation from a single-view image is a popular but challenging task in 3D vision. Although numerous methods have been proposed, existing works still suffer from the inconsistency issues, including 1) semantic inconsistency, 2) geometric inconsistency, and 3) saturation inconsistency, resulting in distorted, overfitted, and over-saturated generations. In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization. Specifically, the semantic encoding stage learns a token independent of views and estimations, promoting semantic consistency and robustness. Meanwhile, the geometric encoding stage learns another token with comprehensive geometry and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Surveying and Cultural Heritage