Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2
Ali Borji

TL;DR
This paper provides a quantitative comparison of Stable Diffusion, Midjourney, and DALL-E 2 in generating photorealistic faces, introducing a new dataset and highlighting Stable Diffusion's superior performance based on FID scores.
Contribution
The study introduces GFW, a new dataset of generated faces, and offers a systematic evaluation of three popular image synthesis models on face realism.
Findings
Stable Diffusion outperforms others in face quality based on FID score.
Introduction of GFW dataset with 15,076 faces for benchmarking.
Results aim to guide future improvements in generative face models.
Abstract
The field of image synthesis has made great strides in the last couple of years. Recent models are capable of generating images with astonishing quality. Fine-grained evaluation of these models on some interesting categories such as faces is still missing. Here, we conduct a quantitative comparison of three popular systems including Stable Diffusion, Midjourney, and DALL-E 2 in their ability to generate photorealistic faces in the wild. We find that Stable Diffusion generates better faces than the other systems, according to the FID score. We also introduce a dataset of generated faces in the wild dubbed GFW, including a total of 15,076 faces. Furthermore, we hope that our study spurs follow-up research in assessing the generative models and improving them. Data and code are available at data and code, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Aesthetic Perception and Analysis
MethodsDiffusion
