A Selection Premium Decomposition for the Expected Maximum of Random Walks

Victor H. de la Pena; Fangyuan Lin; Victor K. de la Pena

arXiv:2602.19481·math.ST·February 24, 2026

A Selection Premium Decomposition for the Expected Maximum of Random Walks

Victor H. de la Pena, Fangyuan Lin, Victor K. de la Pena

PDF

Open Access

TL;DR

This paper analyzes the bias in selecting the best model based on validation data, decomposing the expected maximum score into components that clarify the selection bias and its properties.

Contribution

It introduces a novel decomposition of the selection bias in model evaluation, extending classical results and analyzing properties of the bias function under various conditions.

Findings

01

Derived a formula for the expected maximum score bias.

02

Extended the decomposition to stopping times, recovering Wald's equation.

03

Established a bias concentration law showing bias growth with data fraction.

Abstract

When $K$ models are evaluated on the same validation set of size $n$ , the selected winner's apparent performance is biased upward. Suppose $K$ models are evaluated on a shared sequence of i.i.d. observations $X_{1}, \dots, X_{n}$ , where model $k$ achieves response $f_{k} (X_{i})$ with mean $μ_{k} = E [f_{k} (X)]$ . Writing $Y_{i, k} = f_{k} (X_{i}) - μ_{k}$ for the centered increment and $S_{n, k} = \sum_{i = 1}^{n} Y_{i, k}$ for the centered cumulative score, the expected maximum satisfies $0 \leq E [max_{k} S_{n, k}] = \sum_{i = 1}^{n} E [φ_{K} (S_{i - 1})]$ where $φ_{K} (u) = E [max_{k} (u_{k} + Y_{k})] - max_{k} u_{k}$ , $u \in R^{K}$ , is the selection premium function. This formula corresponds to the null hypothesis case (all models are equal in the sense that they have the same mean), which clarifies that the bias arises from selection. While this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Distribution Estimation and Applications · Financial Risk and Volatility Modeling · Risk and Portfolio Optimization