AnyFace: Free-style Text-to-Face Synthesis and Manipulation

Jianxin Sun; Qiyao Deng; Qi Li; Muyi Sun; Min Ren; Zhenan Sun

arXiv:2203.15334·cs.CV·March 30, 2022·5 cites

AnyFace: Free-style Text-to-Face Synthesis and Manipulation

Jianxin Sun, Qiyao Deng, Qi Li, Muyi Sun, Min Ren, Zhenan Sun

PDF

Open Access

TL;DR

AnyFace is a novel free-style text-to-face synthesis method that enables high-quality, diverse face generation and manipulation from arbitrary descriptions, expanding applications in metaverse, social media, and forensics.

Contribution

The paper introduces a two-stream framework with CLIP-based feature extraction, a Cross Modal Distillation module, and a Diverse Triplet Loss for the first free-style text-to-face synthesis approach.

Findings

01

Outperforms state-of-the-art methods on CelebA-HQ and CelebA-Text-HQ datasets.

02

Achieves high-resolution, diverse, and constraint-free face synthesis.

03

Demonstrates broad applicability in various real-world scenarios.

Abstract

Existing text-to-image synthesis methods generally are only applicable to words in the training dataset. However, human faces are so variable to be described with limited words. So this paper proposes the first free-style text-to-face method namely AnyFace enabling much wider open world applications such as metaverse, social media, cosmetics, forensics, etc. AnyFace has a novel two-stream framework for face image synthesis and manipulation given arbitrary descriptions of the human face. Specifically, one stream performs text-to-face generation and the other conducts face image reconstruction. Facial text and image features are extracted using the CLIP (Contrastive Language-Image Pre-training) encoders. And a collaborative Cross Modal Distillation (CMD) module is designed to align the linguistic and visual features across these two streams. Furthermore, a Diverse Triplet Loss (DT loss)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications

MethodsALIGN · Contrastive Language-Image Pre-training · Triplet Loss