Enhanced Generative Model Evaluation with Clipped Density and Coverage

Nicolas Salvy; Hugues Talbot; Bertrand Thirion

arXiv:2507.01761·cs.LG·February 18, 2026

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Nicolas Salvy, Hugues Talbot, Bertrand Thirion

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Clipped Density and Clipped Coverage, two new metrics for evaluating generative models that are robust, interpretable, and effectively distinguish between high-quality and poor samples.

Contribution

The paper proposes novel evaluation metrics for generative models that improve robustness and interpretability over existing methods.

Findings

01

Clipped metrics outperform existing methods in robustness.

02

Metrics show linear degradation with increasing bad samples.

03

Evaluation is reliable across synthetic and real datasets.

Abstract

Although generative models have made remarkable progress in recent years, their use in critical applications has been hindered by an inability to reliably evaluate the quality of their generated samples. Quality refers to at least two complementary concepts: fidelity and coverage. Current quality metrics often lack reliable, interpretable values due to an absence of calibration or insufficient robustness to outliers. To address these shortcomings, we introduce two novel metrics: Clipped Density and Clipped Coverage. By clipping individual sample contributions, as well as the radii of nearest neighbor balls for fidelity, our metrics prevent out-of-distribution samples from biasing the aggregated values. Through analytical and empirical calibration, these metrics demonstrate linear score degradation as the proportion of bad samples increases. Thus, they can be straightforwardly…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

Strengths: - The paper addresses an important and recognized problem: the lack of reliable, robust, and interpretable metrics for generative models. - The paper presents a well-illustrated analysis of the failure modes of a chosen class of metric: k-NN density-based metrics (like Precision, Density, Coverage) is clear. The figures effectively illustrate the specific problems of outlier sensitivity and non-linear response that the paper aims to solve. - The "clipping" mechanisms are simple and i

Weaknesses

1. Overstated Claims and Missing Key Baselines: The paper's premise that "all existing... metrics are flawed" and that "no metric offers this property" (absolute interpretability) is an overstatement. The paper's analysis is confined almost entirely to kNN density-based metrics, while ignoring a relevant body of work on sampling-based evaluation: arXiv:2402.04355 and arXiv:2302.03026 similarly use distance metrics to probe the underlying density but don’t rely on kNN density estimation; there is

Reviewer 02Rating 6Confidence 4

Strengths

- The paper is nicely written and easy to read, equipped with nice visualizations. - The study is well motivated. - It is novel that the proposed metrics are designed to satisfy robustness, linearity, and interpretability. - The analyses and experiments are comprehensive.

Weaknesses

- The modifications applied to meet the proposed desiderata appear ad-hoc and lack theoretical justification. This raises the question of whether they might negatively impact performance in other aspects, such as distinguishability.

Reviewer 03Rating 6Confidence 4

Strengths

- Calibration for Interpretability: The metrics are calibrated so that their expected value decays linearly as the fraction of bad synthetic samples increases. For Clipped Density, the unnormalized score is divided by the fidelity score computed on the real data and then clipped to [0,1]. For Clipped Coverage, the authors derive the expected value under an i.i.d. assumption using Beta functions and numerically invert it to map unnormalized scores to linear decay. This calibration allows scores t

Weaknesses

- Incremental Novelty: While clipping sample contributions and radii is a sensible modification, the metrics essentially adapt existing Density/Coverage measures. The idea of capping contributions is intuitive; no fundamentally new notion of fidelity or coverage is introduced. The theoretical calibration for Clipped Coverage relies on i.i.d. assumptions and numerical inversion, which is mathematically involved but conceptually just rescales the metric to match a linear decay. - Scope Limited to

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCellular Automata and Applications