GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data

Yudong Li; Hao Li; Xianxu Hou; Linlin Shen

arXiv:2510.18345·cs.CV·October 22, 2025

GPTFace: Generative Pre-training of Facial-Linguistic Transformer by Span Masking and Weakly Correlated Text-image Data

Yudong Li, Hao Li, Xianxu Hou, Linlin Shen

PDF

Open Access

TL;DR

GPTFace introduces a large-scale, web-based pre-training approach for facial knowledge learning using self-supervised tasks, enabling effective facial understanding and editing without extensive manual annotation.

Contribution

The paper proposes a novel generative pre-training model for faces that leverages web data and self-supervised learning, improving scalability and versatility over existing methods.

Findings

01

Achieves performance comparable to state-of-the-art models on facial tasks.

02

Effective for face attribute editing and expression manipulation.

03

Utilizes web data for scalable facial pre-training.

Abstract

Compared to the prosperity of pre-training models in natural image understanding, the research on large-scale pre-training models for facial knowledge learning is still limited. Current approaches mainly rely on manually assembled and annotated face datasets for training, but labeling such datasets is labor-intensive and the trained models have limited scalability beyond the training data. To address these limitations, we present a generative pre-training model for facial knowledge learning that leverages large-scale web-built data for training. We use texts and images containing human faces crawled from the internet and conduct pre-training on self-supervised tasks, including masked image/language modeling (MILM) and image-text matching (ITM). During the generation stage, we further utilize the image-text matching loss to pull the generation distribution towards the control signal for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Face Recognition and Perception