Unifying and extending Precision Recall metrics for assessing generative models
Benjamin Sykes, Loic Simon, Julien Rabin

TL;DR
This paper unifies various precision-recall metrics for evaluating generative models, providing a comprehensive framework that reveals their relationships, limitations, and behaviors through theoretical and experimental analysis.
Contribution
It introduces a unified approach to precision-recall metrics for generative models, exposing their connections, limitations, and providing new consistency results.
Findings
Unified precision-recall framework for generative models
Revealed sources of pitfalls in existing metrics
Analyzed behaviors of precision-recall curves experimentally
Abstract
With the recent success of generative models in image and text, the evaluation of generative models has gained a lot of attention. Whereas most generative models are compared in terms of scalar values such as Frechet Inception Distance (FID) or Inception Score (IS), in the last years (Sajjadi et al., 2018) proposed a definition of precision-recall curve to characterize the closeness of two distributions. Since then, various approaches to precision and recall have seen the light (Kynkaanniemi et al., 2019; Naeem et al., 2020; Park & Kim, 2023). They center their attention on the extreme values of precision and recall, but apart from this fact, their ties are elusive. In this paper, we unify most of these approaches under the same umbrella, relying on the work of (Simon et al., 2019). Doing so, we were able not only to recover entire curves, but also to expose the sources of the accounted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies
