Post-Selection Distributional Model Evaluation

Amirmohammad Farzaneh; Osvaldo Simeone

arXiv:2603.23055·stat.ML·May 8, 2026

Post-Selection Distributional Model Evaluation

Amirmohammad Farzaneh, Osvaldo Simeone

PDF

TL;DR

This paper introduces PS-DME, a framework for valid distributional model evaluation after data-dependent model pre-selection, enabling reliable performance-reliability trade-off analysis.

Contribution

It develops a statistically valid method based on e-values to assess model KPI distributions post-selection, controlling false coverage rate and improving sample efficiency.

Findings

01

PS-DME controls post-selection false coverage rate in distributional estimates.

02

It is more sample efficient than sample splitting under certain conditions.

03

Experiments demonstrate PS-DME's effectiveness across synthetic, language, and telecom data.

Abstract

Formal model evaluation methods typically certify that a model satisfies a prescribed target key performance indicator (KPI) level. However, in many applications, the relevant target KPI level may not be known a priori, and the user may instead wish to compare candidate models by analyzing the full trade-offs between performance and reliability achievable at test time by the models. This task, requiring the reliable estimate of the test-time KPI distributions, is made more complicated by the fact that the same data must often be used both to pre-select a subset of candidate models and to estimate their KPI distributions, causing a potential post-selection bias. In this work, we introduce post-selection distributional model evaluation (PS-DME), a general framework for statistically valid distributional model assessment after arbitrary data-dependent model pre-selection. Building on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.