Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation

Dawei Dai; Yinxiu Zhou; Chenghang Li; Guolai Jiang; and Chengfang Zhang

arXiv:2510.21775·cs.CV·December 2, 2025

Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation

Dawei Dai, Yinxiu Zhou, Chenghang Li, Guolai Jiang, and Chengfang Zhang

PDF

TL;DR

Face-MakeUpV2 is a novel facial image generation model that ensures facial identity and physical consistency with reference images, addressing attribute leakage and local semantic instruction challenges.

Contribution

It introduces a large-scale dataset and a dual-channel injection method with optimized training objectives for improved facial consistency in text-to-image generation.

Findings

01

Achieves superior face ID preservation

02

Maintains physical consistency with reference images

03

Outperforms existing models in facial editing tasks

Abstract

In facial image generation, current text-to-image models often suffer from facial attribute leakage and insufficient physical consistency when responding to local semantic instructions. In this study, we propose Face-MakeUpV2, a facial image generation model that aims to maintain the consistency of face ID and physical characteristics with the reference image. First, we constructed a large-scale dataset FaceCaptionMask-1M comprising approximately one million image-text-masks pairs that provide precise spatial supervision for the local semantic instructions. Second, we employed a general text-to-image pretrained model as the backbone and introduced two complementary facial information injection channels: a 3D facial rendering channel to incorporate the physical characteristics of the image and a global facial feature channel. Third, we formulated two optimization objectives for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.