Learning to Validate Generative Models: a Goodness-of-Fit Approach
Pietro Cappelli, Gaia Grosso, Marco Letizia, Humberto Reyes-Gonz\'alez, Marco Zanetti

TL;DR
This paper introduces NPLM, a learning-based goodness-of-fit test for validating high-dimensional generative models, demonstrating its effectiveness in scientific data scenarios and its ability to identify model deficiencies.
Contribution
The paper presents NPLM, a novel, scalable, and interpretable validation method for high-dimensional generative models inspired by Neyman--Pearson theory.
Findings
NPLM outperforms traditional validation methods in high-dimensional settings.
NPLM effectively diagnoses regions where models underperform.
Validated on Gaussian mixtures and high-energy physics data.
Abstract
Generative models are increasingly central to scientific workflows, yet their systematic use and interpretation require a proper understanding of their limitations through rigorous validation. Classic approaches struggle with scalability, statistical power, or interpretability when applied to high-dimensional data, making it difficult to certify the reliability of these models in realistic, high-dimensional scientific settings. Here, we propose the use of the New Physics Learning Machine (NPLM), a learning-based approach to goodness-of-fit testing inspired by the Neyman--Pearson construction, to test generative networks trained on high-dimensional scientific data. We demonstrate the performance of NPLM for validation in two benchmark cases: generative models trained on mixtures of Gaussian models with increasing dimensionality, and a public end-to-end model, known as FlowSim, developed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference
