MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance
Debin Meng, Christos Tzelepis, Ioannis Patras, and Georgios, Tzimiropoulos

TL;DR
MM2Latent is a practical multimodal framework that enhances controllability and editing capabilities in human portrait generation using StyleGAN2, with improved efficiency and real image editing features.
Contribution
It introduces a hyperparameter-free, fast inference multimodal generation and editing framework that enables real image editing and surpasses existing methods in performance.
Findings
Superior multimodal image generation performance
Effective multimodal image editing capabilities
Faster inference compared to GAN- and diffusion-based methods
Abstract
Generating human portraits is a hot topic in the image generation area, e.g. mask-to-face generation and text-to-face generation. However, these unimodal generation methods lack controllability in image generation. Controllability can be enhanced by exploring the advantages and complementarities of various modalities. For instance, we can utilize the advantages of text in controlling diverse attributes and masks in controlling spatial locations. Current state-of-the-art methods in multimodal generation face limitations due to their reliance on extensive hyperparameters, manual operations during the inference stage, substantial computational demands during training and inference, or inability to edit real images. In this paper, we propose a practical framework - MM2Latent - for multimodal image generation and editing. We use StyleGAN2 as our image generator, FaRL for text encoding, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
MethodsPath Length Regularization · HuMan(Expedia)||How do I get a human at Expedia? · Weight Demodulation · StyleGAN2 · Adaptive Instance Normalization · Dense Connections · Feedforward Network · R1 Regularization · Convolution · StyleGAN
