Grade Inflation in Generative Models
Phuc Nguyen, Miao Li, Alexandra Morgan, Rima Arnaout, and Ramy Arnaout

TL;DR
This paper identifies the problem of grade inflation in common quality scores for generative models, introduces the Eden score as a solution, and demonstrates its better alignment with human perception and avoidance of grade inflation.
Contribution
The paper introduces the Eden score, the first equidensity score, which avoids grade inflation and better matches human perception in evaluating generative models.
Findings
Most common scores suffer from grade inflation.
Eden score avoids grade inflation and aligns with human perception.
Equidensity scores relate to Rényi entropy.
Abstract
Generative models hold great potential, but only if one can trust the evaluation of the data they generate. We show that many commonly used quality scores for comparing two-dimensional distributions of synthetic vs. ground-truth data give better results than they should, a phenomenon we call the "grade inflation problem." We show that the correlation score, Jaccard score, earth-mover's score, and Kullback-Leibler (relative-entropy) score all suffer grade inflation. We propose that any score that values all datapoints equally, as these do, will also exhibit grade inflation; we refer to such scores as "equipoint" scores. We introduce the concept of "equidensity" scores, and present the Eden score, to our knowledge the first example of such a score. We found that Eden avoids grade inflation and agrees better with human perception of goodness-of-fit than the equipoint scores above. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
