GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data
Yudong Li, Hao Li, Xianxu Hou, Linlin Shen

TL;DR
GPTFace introduces a large-scale, web-based pre-training approach for facial knowledge learning using self-supervised tasks, enabling effective facial understanding and editing without extensive manual annotation.
Contribution
The paper proposes a novel generative pre-training model for faces that leverages web data and self-supervised learning, improving scalability and versatility over existing methods.
Findings
Achieves performance comparable to state-of-the-art models on facial tasks.
Effective for face attribute editing and expression manipulation.
Utilizes web data for scalable facial pre-training.
Abstract
Compared to the prosperity of pre-training models in natural image understanding, the research on large-scale pre-training models for facial knowledge learning is still limited. Current approaches mainly rely on manually assembled and annotated face datasets for training, but labeling such datasets is labor-intensive and the trained models have limited scalability beyond the training data. To address these limitations, we present a generative pre-training model for facial knowledge learning that leverages large-scale web-built data for training. We use texts and images containing human faces crawled from the internet and conduct pre-training on self-supervised tasks, including masked image/language modeling (MILM) and image-text matching (ITM). During the generation stage, we further utilize the image-text matching loss to pull the generation distribution towards the control signal for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Face Recognition and Perception
