TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy   Text Fields

Tianyu Huang; Yihan Zeng; Bowen Dong; Hang Xu; Songcen Xu; Rynson W.H.; Lau; Wangmeng Zuo

arXiv:2309.17175·cs.CV·March 15, 2024·1 cites

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W.H., Lau, Wangmeng Zuo

PDF

Open Access

TL;DR

TextField3D introduces a novel approach for open-vocabulary 3D generation by injecting dynamic noise into text prompts' latent space, enabling broader vocabulary and improved text consistency in 3D models.

Contribution

The paper proposes Noisy Text Fields (NTFs) and associated modules to expand vocabulary and enhance open-vocabulary 3D generation from limited data.

Findings

01

Achieves open-vocabulary 3D generation capability

02

Improves text consistency in 3D generation

03

Supports image-conditional 3D generation

Abstract

Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rather than using the text prompts as input directly, we suggest to inject dynamic noise into the latent space of given text prompts, i.e., Noisy Text Fields (NTFs). In this way, limited 3D data can be mapped to the appropriate range of textual latent space that is expanded by NTFs. To this end, an NTFGen module is proposed to model general text latent code in noisy fields. Meanwhile, an NTFBind module is proposed to align view-invariant image latent code to noisy fields, further supporting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques

MethodsALIGN