A note on the evaluation of generative models

Lucas Theis; A\"aron van den Oord; Matthias Bethge

arXiv:1511.01844·stat.ML·April 26, 2016·ICLR·435 cites

A note on the evaluation of generative models

Lucas Theis, A\"aron van den Oord, Matthias Bethge

PDF

Open Access 1 Repo

TL;DR

This paper critically examines common evaluation metrics for image generative models, highlighting their independence and limitations, and emphasizes the importance of application-specific evaluation methods.

Contribution

It clarifies the independence of evaluation criteria and advises against using Parzen window estimates, promoting direct, application-oriented evaluation of generative models.

Findings

01

Average log-likelihood, Parzen estimates, and visual fidelity are largely independent.

02

Good performance in one metric does not imply good performance in others.

03

Parzen window estimates should generally be avoided.

Abstract

Probabilistic generative models can be used for compression, denoising, inpainting, texture synthesis, semi-supervised learning, unsupervised feature learning, and other tasks. Given this wide range of applications, it is not surprising that a lot of heterogeneity exists in the way these models are formulated, trained, and evaluated. As a consequence, direct comparison between models is often difficult. This article reviews mostly known but often underappreciated properties relating to the evaluation and interpretation of generative models with a focus on image models. In particular, we show that three of the currently most commonly used criteria---average log-likelihood, Parzen window estimates, and visual fidelity of samples---are largely independent of each other when the data is high-dimensional. Good performance with respect to one criterion therefore need not imply good…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kpandey008/DCGANS
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Video Analysis and Summarization