A Note on the Inception Score
Shane Barratt, Rishi Sharma

TL;DR
This paper critically examines the Inception Score, revealing its limitations in evaluating generative models and emphasizing the need for more reliable assessment methods in the field.
Contribution
It provides a detailed analysis of the Inception Score's shortcomings and advocates for more systematic evaluation practices in generative model research.
Findings
Inception Score can be misleading when comparing models.
The metric has inherent suboptimalities affecting its reliability.
Careful evaluation is crucial for meaningful progress in generative modeling.
Abstract
Deep generative models are powerful tools that have produced impressive results in recent years. These advances have been for the most part empirically driven, making it essential that we use high quality evaluation metrics. In this paper, we provide new insights into the Inception Score, a recently proposed and widely used evaluation metric for generative models, and demonstrate that it fails to provide useful guidance when comparing models. We discuss both suboptimalities of the metric itself and issues with its application. Finally, we call for researchers to be more systematic and careful when evaluating and comparing generative models, as the advancement of the field depends upon it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Data Visualization and Analytics · Anomaly Detection Techniques and Applications
