FFHQ-Makeup: Paired Synthetic Makeup Dataset with Facial Consistency Across Multiple Styles

Xingchao Yang; Shiori Ueda; Yuantian Huang; Tomoya Akiyama; Takafumi Taketomi

arXiv:2508.03241·cs.CV·August 7, 2025

FFHQ-Makeup: Paired Synthetic Makeup Dataset with Facial Consistency Across Multiple Styles

Xingchao Yang, Shiori Ueda, Yuantian Huang, Tomoya Akiyama, Takafumi Taketomi

PDF

Open Access 1 Datasets 3 Reviews

TL;DR

This paper introduces FFHQ-Makeup, a high-quality synthetic dataset of paired bare and makeup facial images across multiple styles, ensuring facial consistency and realism for beauty-related AI tasks.

Contribution

The work presents a novel pipeline for creating a large-scale, high-quality paired makeup dataset with consistent identity and expression, filling a significant gap in available resources.

Findings

01

Created 90K high-quality paired images across 18K identities.

02

Achieved realistic makeup transfer preserving facial identity and expression.

03

First dataset specifically focused on paired makeup images for research.

Abstract

Paired bare-makeup facial images are essential for a wide range of beauty-related tasks, such as virtual try-on, facial privacy protection, and facial aesthetics analysis. However, collecting high-quality paired makeup datasets remains a significant challenge. Real-world data acquisition is constrained by the difficulty of collecting large-scale paired images, while existing synthetic approaches often suffer from limited realism or inconsistencies between bare and makeup images. Current synthetic methods typically fall into two categories: warping-based transformations, which often distort facial geometry and compromise the precision of makeup; and text-to-image generation, which tends to alter facial identity and expression, undermining consistency. In this work, we present FFHQ-Makeup, a high-quality synthetic makeup dataset that pairs each identity with multiple makeup styles while…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- Scale and structure: reasonably large, paired, multi‑style dataset; pairs are useful for supervised training and controlled evaluation. - Clear construction pipeline with pragmatic engineering (3DMM‑based residual, re‑rendering augmentation, background blending) and documented manual cleaning. - The paper is clearly written and acknowledges several remaining limitations (e.g., bias toward daily styles, 3DMM/segmentation artifacts).

Weaknesses

- Utility not convincingly demonstrated. A dataset paper should show that training models on the new data substantially improves downstream tasks (e.g., makeup transfer, virtual try‑on, recognition under makeup) against strong baselines and across public test sets. The paper lacks such end‑task training/evaluation; results are mostly pairwise similarity and small‑scale preference checks, which do not establish practical value. - No human evaluation. All “preference” judgments use VLMs on ~50 gro

Reviewer 02Rating 2Confidence 4

Strengths

- The dataset construction pipeline is well-structured and combines multiple techniques to improve facial consistency. - The paper provides thorough ablation studies and qualitative comparisons against existing synthetic datasets, showing clearer visual fidelity and identity preservation. - The public release of such a large paired dataset could be beneficial for downstream research in makeup transfer and facial analysis.

Weaknesses

- Limited novelty. The work primarily extends existing diffusion-based makeup transfer pipelines with 3DMM-based residual computation. While this combination is technically reasonable, it appears more as an incremental improvement rather than a conceptual breakthrough. The paper could better clarify what is fundamentally novel about the method compared to previous synthetic data generation approaches. - In addition, insufficient validation on downstream tasks. The dataset is evaluated mainly on

Reviewer 03Rating 4Confidence 3

Strengths

Dataset contribution. This work onstructs a large-scale high-quality and multi-style paired makeup dataset, which would benefit a wide range of future makeup-related research and applications.

Weaknesses

1. Limited technical novelty. The pipeline mainly relies on the existing model Stable-Makeup. The data construction pipeline appears to merely process existing data using off-the-shelf models, without addressing any substantive technical challenges. 2. Insufficient motivation and lack of interpretability. The ablation study focus on two variants of feature extraction: makeup residual and sampling and re-rendering augmentation. This appears to be only a minor modification of the module, which see

Code & Models

Datasets

cyberagent/FFHQ-Makeup
dataset· 86 dl
86 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Facial Rejuvenation and Surgery Techniques