Generative Pre-training for Subjective Tasks: A Diffusion Transformer-Based Framework for Facial Beauty Prediction
Djamel Eddine Boukhari, Ali chemsa

TL;DR
This paper introduces a diffusion transformer-based framework for facial beauty prediction, leveraging generative pre-training on facial data to improve aesthetic assessment accuracy beyond traditional methods.
Contribution
The paper presents a novel two-stage framework that uses generative pre-training of a Diffusion Transformer for domain-specific feature extraction in facial beauty prediction.
Findings
Achieved a Pearson Correlation Coefficient of 0.932 on FBP5500 benchmark.
Outperformed prior methods based on general-purpose pre-training.
Validated the effectiveness of generative pre-training through extensive ablation studies.
Abstract
Facial Beauty Prediction (FBP) is a challenging computer vision task due to its subjective nature and the subtle, holistic features that influence human perception. Prevailing methods, often based on deep convolutional networks or standard Vision Transformers pre-trained on generic object classification (e.g., ImageNet), struggle to learn feature representations that are truly aligned with high-level aesthetic assessment. In this paper, we propose a novel two-stage framework that leverages the power of generative models to create a superior, domain-specific feature extractor. In the first stage, we pre-train a Diffusion Transformer on a large-scale, unlabeled facial dataset (FFHQ) through a self-supervised denoising task. This process forces the model to learn the fundamental data distribution of human faces, capturing nuanced details and structural priors essential for aesthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
