Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
Farzan Farnia, Mohammad Jalali, Azim Ospanov

TL;DR
This paper investigates the systematic diversity bias in deep generative models, revealing that they tend to underestimate data diversity due to finite-sample effects, and proposes strategies to mitigate this issue.
Contribution
It identifies the origin of diversity bias in generative models and introduces diversity-aware regularization methods based on entropy scores.
Findings
Test data has higher diversity scores than generated samples.
Diversity scores increase with sample size, causing underestimation.
Proposed regularization strategies show potential to improve diversity.
Abstract
Deep generative models have achieved great success in producing high-quality samples, making them a central tool across machine learning applications. Beyond sample quality, an important yet less systematically studied question is whether trained generative models faithfully capture the diversity of the underlying data distribution. In this work, we address this question by directly comparing the diversity of samples generated by state-of-the-art models with that of test samples drawn from the target data distribution, using recently proposed reference-free entropy-based diversity scores, Vendi and RKE. Across multiple benchmark datasets, we find that test data consistently attains substantially higher Vendi and RKE diversity scores than the generated samples, suggesting a systematic downward diversity bias in modern generative models. To understand the origin of this bias, we analyze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Computational and Text Analysis Methods
