Position: All Current Generative Fidelity and Diversity Metrics are Flawed
Ossi R\"ais\"a, Boris van Breugel, Mihaela van der Schaar

TL;DR
This paper critically evaluates existing generative data metrics, demonstrating their flaws through sanity checks, and advocates for the development of more reliable metrics to improve synthetic data evaluation.
Contribution
It introduces a set of desiderata and sanity checks for generative metrics, concluding that all current metrics are flawed and highlighting the need for better evaluation methods.
Findings
Current metrics lack robustness and clear bounds.
Sanity checks reveal widespread failures in existing metrics.
The paper provides guidelines for proper metric usage.
Abstract
Any method's development and practical application is limited by our ability to measure its reliability. The popularity of generative modeling emphasizes the importance of good synthetic data metrics. Unfortunately, previous works have found many failure cases in current metrics, for example lack of outlier robustness and unclear lower and upper bounds. We propose a list of desiderata for synthetic data metrics, and a suite of sanity checks: carefully chosen simple experiments that aim to detect specific and known generative modeling failure modes. Based on these desiderata and the results of our checks, we arrive at our position: all current generative fidelity and diversity metrics are flawed. This significantly hinders practical use of synthetic data. Our aim is to convince the research community to spend more effort in developing metrics, instead of models. Additionally, through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Quality and Management · Software System Performance and Reliability
