Post-Selection Distributional Model Evaluation
Amirmohammad Farzaneh, Osvaldo Simeone

TL;DR
This paper introduces PS-DME, a framework for valid distributional model evaluation after data-dependent model pre-selection, enabling reliable performance-reliability trade-off analysis.
Contribution
It develops a statistically valid method based on e-values to assess model KPI distributions post-selection, controlling false coverage rate and improving sample efficiency.
Findings
PS-DME controls post-selection false coverage rate in distributional estimates.
It is more sample efficient than sample splitting under certain conditions.
Experiments demonstrate PS-DME's effectiveness across synthetic, language, and telecom data.
Abstract
Formal model evaluation methods typically certify that a model satisfies a prescribed target key performance indicator (KPI) level. However, in many applications, the relevant target KPI level may not be known a priori, and the user may instead wish to compare candidate models by analyzing the full trade-offs between performance and reliability achievable at test time by the models. This task, requiring the reliable estimate of the test-time KPI distributions, is made more complicated by the fact that the same data must often be used both to pre-select a subset of candidate models and to estimate their KPI distributions, causing a potential post-selection bias. In this work, we introduce post-selection distributional model evaluation (PS-DME), a general framework for statistically valid distributional model assessment after arbitrary data-dependent model pre-selection. Building on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
