MM2Latent: Text-to-facial image generation and editing in GANs with   multimodal assistance

Debin Meng; Christos Tzelepis; Ioannis Patras; and Georgios; Tzimiropoulos

arXiv:2409.11010·cs.CV·September 18, 2024

MM2Latent: Text-to-facial image generation and editing in GANs with multimodal assistance

Debin Meng, Christos Tzelepis, Ioannis Patras, and Georgios, Tzimiropoulos

PDF

Open Access 1 Repo

TL;DR

MM2Latent is a practical multimodal framework that enhances controllability and editing capabilities in human portrait generation using StyleGAN2, with improved efficiency and real image editing features.

Contribution

It introduces a hyperparameter-free, fast inference multimodal generation and editing framework that enables real image editing and surpasses existing methods in performance.

Findings

01

Superior multimodal image generation performance

02

Effective multimodal image editing capabilities

03

Faster inference compared to GAN- and diffusion-based methods

Abstract

Generating human portraits is a hot topic in the image generation area, e.g. mask-to-face generation and text-to-face generation. However, these unimodal generation methods lack controllability in image generation. Controllability can be enhanced by exploring the advantages and complementarities of various modalities. For instance, we can utilize the advantages of text in controlling diverse attributes and masks in controlling spatial locations. Current state-of-the-art methods in multimodal generation face limitations due to their reliance on extensive hyperparameters, manual operations during the inference stage, substantial computational demands during training and inference, or inability to edit real images. In this paper, we propose a practical framework - MM2Latent - for multimodal image generation and editing. We use StyleGAN2 as our image generator, FaRL for text encoding, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-debin/mm2latent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsPath Length Regularization · HuMan(Expedia)||How do I get a human at Expedia? · Weight Demodulation · StyleGAN2 · Adaptive Instance Normalization · Dense Connections · Feedforward Network · R1 Regularization · Convolution · StyleGAN