A Theoretical Framework for Statistical Evaluability of Generative Models
Shashaank Aiyer, Yishay Mansour, Shay Moran, Han Shao

TL;DR
This paper develops a theoretical framework for evaluating generative models, analyzing the evaluability of metrics like IPMs and divergences from finite samples, and discusses the limitations of perplexity.
Contribution
It introduces a formal framework for assessing the evaluability of generative model metrics, establishing conditions under which they can be reliably estimated from finite data.
Findings
IPMs with bounded test classes are evaluable from finite samples.
Finite fat-shattering dimension allows arbitrary precision evaluation of IPMs.
Rènyi and KL divergences are not evaluable from finite samples.
Abstract
Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d.\ test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets. In contrast, evaluation is more challenging for generative models due to their open-ended nature: it is unclear which metrics are appropriate and whether such metrics can be reliably evaluated from finite samples. In this work, we introduce a theoretical framework for evaluating generative models and establish evaluability results for commonly used metrics. We study two categories of metrics: test-based metrics, including integral probability metrics (IPMs), and R\'enyi divergences. We show that IPMs with respect to any bounded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
