Towards GAN Benchmarks Which Require Generalization
Ishaan Gulrajani, Colin Raffel, Luke Metz

TL;DR
This paper proposes a new evaluation metric for generative models based on neural network divergences, which require models to generalize rather than memorize training data, leading to more meaningful benchmarks.
Contribution
It introduces a neural network divergence-based evaluation metric that cannot be gamed by memorization, addressing a key limitation of existing benchmarks.
Findings
The proposed metric correlates well with perceptual quality.
It effectively measures diversity and generalization.
The metric is computable from samples only.
Abstract
For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic. We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. In search of such a metric, we turn to neural network divergences (NNDs), which are defined in terms of a neural network trained to distinguish between distributions. The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples. We survey past work on using NNDs for evaluation and implement an example black-box metric based on these ideas. Through experimental validation we show that it can effectively measure diversity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Formal Methods in Verification · Advanced Software Engineering Methodologies
