Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim, Changjae Oh, Hoseok Do, Soohyun Kim, Kwanghoon Sohn

TL;DR
This paper introduces a novel multi-modal face image generation approach that combines GANs and diffusion models to produce realistic, multi-view, and stylized face images from text prompts and visual inputs, outperforming existing methods.
Contribution
The paper proposes a new method integrating diffusion models with GANs for multi-modal face image generation, including a simple mapping, style modulation, and multi-step training for improved realism and multi-view consistency.
Findings
Produces realistic 2D and 3D-aware face images
Outperforms existing face generation methods
Effective multi-modal input integration
Abstract
We present a new multi-modal face image generation method that converts a text prompt and a visual input, such as a semantic mask or scribble map, into a photo-realistic face image. To do this, we combine the strengths of Generative Adversarial networks (GANs) and diffusion models (DMs) by employing the multi-modal features in the DM into the latent space of the pre-trained GANs. We present a simple mapping and a style modulation network to link two models and convert meaningful representations in feature maps and attention maps into latent codes. With GAN inversion, the estimated latent codes can be used to generate 2D or 3D-aware facial images. We further present a multi-step training strategy that reflects textual and structural representations into the generated image. Our proposed network produces realistic 2D, multi-view, and stylized face images, which align well with inputs. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Image Processing Techniques and Applications
MethodsALIGN · Diffusion
