RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

Ganlin Feng; Yuxi Long; Hafsa Ali; Erin Lou; Fahad Butt; Qian Liu; Yang Wang; Pingzhao Hu

arXiv:2604.03454·cs.CV·April 7, 2026

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

Ganlin Feng, Yuxi Long, Hafsa Ali, Erin Lou, Fahad Butt, Qian Liu, Yang Wang, Pingzhao Hu

PDF

TL;DR

RDFace is a benchmark dataset of 456 pediatric facial images with rare genetic conditions, designed to facilitate AI diagnosis under extreme data scarcity and to evaluate synthetic data augmentation methods.

Contribution

The paper introduces RDFace, a curated dataset with standardized metadata, and demonstrates synthetic augmentation techniques to improve AI diagnostic accuracy in low-data scenarios.

Findings

01

Synthetic augmentation with DreamBooth and FastGAN improves diagnostic accuracy by up to 13.7%.

02

Generated images maintain phenotype fidelity through landmark similarity filtering.

03

Phenotype descriptions from real and synthetic images achieve a report similarity score of 0.84.

Abstract

Rare diseases often manifest with distinctive facial phenotypes in children, offering valuable diagnostic cues for clinicians and AI-assisted screening systems. However, progress in this field is severely limited by the scarcity of curated, ethically sourced facial data and the high similarity among phenotypes across different conditions. To address these challenges, we introduce RDFace, a curated benchmark dataset comprising 456 pediatric facial images spanning 103 rare genetic conditions (average 4.4 samples per condition). Each ethically verified image is paired with standardized metadata. RDFace enables the development and evaluation of data-efficient AI models for rare disease diagnosis under real-world low-data constraints. We benchmark multiple pretrained vision backbones using cross-validation and explore synthetic augmentation with DreamBooth and FastGAN. Generated images are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.