A Selection Premium Decomposition for the Expected Maximum of Random Walks
Victor H. de la Pena, Fangyuan Lin, Victor K. de la Pena

TL;DR
This paper analyzes the bias in selecting the best model based on validation data, decomposing the expected maximum score into components that clarify the selection bias and its properties.
Contribution
It introduces a novel decomposition of the selection bias in model evaluation, extending classical results and analyzing properties of the bias function under various conditions.
Findings
Derived a formula for the expected maximum score bias.
Extended the decomposition to stopping times, recovering Wald's equation.
Established a bias concentration law showing bias growth with data fraction.
Abstract
When models are evaluated on the same validation set of size , the selected winner's apparent performance is biased upward. Suppose models are evaluated on a shared sequence of i.i.d. observations , where model achieves response with mean . Writing for the centered increment and for the centered cumulative score, the expected maximum satisfies where , , is the selection premium function. This formula corresponds to the null hypothesis case (all models are equal in the sense that they have the same mean), which clarifies that the bias arises from selection. While this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Distribution Estimation and Applications · Financial Risk and Volatility Modeling · Risk and Portfolio Optimization
