FFHQ-Makeup: Paired Synthetic Makeup Dataset with Facial Consistency Across Multiple Styles
Xingchao Yang, Shiori Ueda, Yuantian Huang, Tomoya Akiyama, Takafumi Taketomi

TL;DR
This paper introduces FFHQ-Makeup, a high-quality synthetic dataset of paired bare and makeup facial images across multiple styles, ensuring facial consistency and realism for beauty-related AI tasks.
Contribution
The work presents a novel pipeline for creating a large-scale, high-quality paired makeup dataset with consistent identity and expression, filling a significant gap in available resources.
Findings
Created 90K high-quality paired images across 18K identities.
Achieved realistic makeup transfer preserving facial identity and expression.
First dataset specifically focused on paired makeup images for research.
Abstract
Paired bare-makeup facial images are essential for a wide range of beauty-related tasks, such as virtual try-on, facial privacy protection, and facial aesthetics analysis. However, collecting high-quality paired makeup datasets remains a significant challenge. Real-world data acquisition is constrained by the difficulty of collecting large-scale paired images, while existing synthetic approaches often suffer from limited realism or inconsistencies between bare and makeup images. Current synthetic methods typically fall into two categories: warping-based transformations, which often distort facial geometry and compromise the precision of makeup; and text-to-image generation, which tends to alter facial identity and expression, undermining consistency. In this work, we present FFHQ-Makeup, a high-quality synthetic makeup dataset that pairs each identity with multiple makeup styles while…
Peer Reviews
Decision·Submitted to ICLR 2026
- Scale and structure: reasonably large, paired, multi‑style dataset; pairs are useful for supervised training and controlled evaluation. - Clear construction pipeline with pragmatic engineering (3DMM‑based residual, re‑rendering augmentation, background blending) and documented manual cleaning. - The paper is clearly written and acknowledges several remaining limitations (e.g., bias toward daily styles, 3DMM/segmentation artifacts).
- Utility not convincingly demonstrated. A dataset paper should show that training models on the new data substantially improves downstream tasks (e.g., makeup transfer, virtual try‑on, recognition under makeup) against strong baselines and across public test sets. The paper lacks such end‑task training/evaluation; results are mostly pairwise similarity and small‑scale preference checks, which do not establish practical value. - No human evaluation. All “preference” judgments use VLMs on ~50 gro
- The dataset construction pipeline is well-structured and combines multiple techniques to improve facial consistency. - The paper provides thorough ablation studies and qualitative comparisons against existing synthetic datasets, showing clearer visual fidelity and identity preservation. - The public release of such a large paired dataset could be beneficial for downstream research in makeup transfer and facial analysis.
- Limited novelty. The work primarily extends existing diffusion-based makeup transfer pipelines with 3DMM-based residual computation. While this combination is technically reasonable, it appears more as an incremental improvement rather than a conceptual breakthrough. The paper could better clarify what is fundamentally novel about the method compared to previous synthetic data generation approaches. - In addition, insufficient validation on downstream tasks. The dataset is evaluated mainly on
Dataset contribution. This work onstructs a large-scale high-quality and multi-style paired makeup dataset, which would benefit a wide range of future makeup-related research and applications.
1. Limited technical novelty. The pipeline mainly relies on the existing model Stable-Makeup. The data construction pipeline appears to merely process existing data using off-the-shelf models, without addressing any substantive technical challenges. 2. Insufficient motivation and lack of interpretability. The ablation study focus on two variants of feature extraction: makeup residual and sampling and re-rendering augmentation. This appears to be only a minor modification of the module, which see
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Facial Rejuvenation and Surgery Techniques
