GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space

Souhaib Attaiki; Paul Guerrero; Duygu Ceylan; Niloy J. Mitra; Maks; Ovsjanikov

arXiv:2412.16717·cs.CV·December 24, 2024

GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space

Souhaib Attaiki, Paul Guerrero, Duygu Ceylan, Niloy J. Mitra, Maks, Ovsjanikov

PDF

Open Access

TL;DR

GANFusion combines GANs and diffusion models to create a feed-forward text-to-3D generator trained solely on 2D data, enabling high-quality, text-conditioned 3D human character generation without extensive 3D supervision.

Contribution

It introduces a novel method that leverages GANs for 3D feature generation and diffusion models for text conditioning, trained only on 2D data, improving 3D generation fidelity and flexibility.

Findings

01

Achieves high-quality 3D human character generation from text.

02

Operates effectively with only 2D supervision, reducing data requirements.

03

Enables direct feed-forward text-to-3D generation without test-time optimization.

Abstract

We train a feed-forward text-to-3D diffusion generator for human characters using only single-view 2D data for supervision. Existing 3D generative models cannot yet match the fidelity of image or video generative models. State-of-the-art 3D generators are either trained with explicit 3D supervision and are thus limited by the volume and diversity of existing 3D data. Meanwhile, generators that can be trained with only 2D data as supervision typically produce coarser results, cannot be text-conditioned, or must revert to test-time optimization. We observe that GAN- and diffusion-based generators have complementary qualities: GANs can be trained efficiently with 2D supervision to produce high-quality 3D objects but are hard to condition on text. In contrast, denoising diffusion models can be conditioned efficiently but tend to be hard to train with only 2D supervision. We introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques

MethodsDiffusion