Time-uniform confidence bands for the CDF under nonstationarity
Paul Mineiro, Steven R. Howard

TL;DR
This paper develops valid, time-uniform confidence bands for the cumulative distribution function (CDF) of a random variable in nonstationary settings, extending to importance-weighted cases for counterfactual analysis.
Contribution
It introduces the first computationally feasible, always valid confidence bounds on the CDF under nonstationarity, with convergence guarantees and applicability to importance-weighted data.
Findings
Provides time-uniform confidence bands valid under nonstationarity.
Extends bounds to importance-weighted estimations for counterfactual distributions.
Guarantees convergence in arbitrary data-dependent environments.
Abstract
Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF of the running averaged conditional distribution of a real-valued random variable which are always valid and sometimes trivial, along with an instance-dependent convergence guarantee. The importance-weighted extension is appropriate for estimating complete counterfactual distributions of rewards given controlled experimentation data exhaust, e.g., from an A/B test or a contextual bandit.
| What | Width | Time (sec) | |
|---|---|---|---|
| DDRM | 0.09 | 24.8 | |
| Emp. Bern | 0.10 | 1.0 | |
| DDRM | 0.052 | 59.4 | |
| Emp. Bern | 0.125 | 2.4 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
Time-uniform confidence bands for the CDF under nonstationarity
Paul Mineiro
Microsoft Research
[email protected] &Steve Howard
Abstract
Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF of the running averaged conditional distribution of a real-valued random variable which are always valid and sometimes trivial, along with an instance-dependent convergence guarantee. The importance-weighted extension is appropriate for estimating complete counterfactual distributions of rewards given controlled experimentation data exhaust, e.g., from an A/B test or a contextual bandit.
1 Introduction
What would have happened if I had acted differently? Although this question is as old as time itself, successful companies have recently embraced this question via counterfactual estimation of outcomes from the exhaust of their controlled experimentation platforms, e.g., based upon A/B testing or contextual bandits. These experiments are run in the real (digital) world, which is rich enough to demand statistical techniques that are non-asymptotic, non-parametric, and non-stationary. Although recent advances admit characterizing counterfactual average outcomes in this general setting, counterfactually estimating a complete distribution of outcomes is heretofore only possible with additional assumptions. Nonethless, the practical importance of this problem has motivated multiple solutions: see Section 1 for a summary, and Section 5 for complete discussion.
Intriguingly, this problem is provably impossible in the data dependent setting without additional assumptions. Rakhlin et al. (2015) Consequently, our bounds always achieve non-asymptotic coverage, but may converge to zero width slowly or not at all, depending on the hardness of the instance. We call this design principle AVAST (Always Valid And Sometimes Trivial).
In pursuit of our ultimate goal, we derive factual distribution estimators which are useful for estimating the complete distribution of outcomes from direct experience.
Contributions
In Section 3.1 we provide a time and value uniform upper bound on the CDF of the averaged historical conditional distribution of a discrete-time real-valued random process. Consistent with the lack of sequential uniform convergence of linear threshold functions (Rakhlin et al., 2015), the bounds are always valid and sometimes trivial, but with an instance-dependent guarantee: when the data generating process is smooth qua Block et al. (2022) with respect to the uniform distribution on the unit interval, the bound width adapts to the unknown smoothness parameter. 2. 2.
In Section 3.2 we extend the previous technique to distributions with support over the entire real line, and further to distributions with a known countably infinite or unknown nowhere dense set of discrete jumps; with analogous instance-dependent guarantees. 3. 3.
In Section 3.3 we extend the previous techniques to importance-weighted random variables, achieving our ultimate goal of estimating a complete counterfactual distribution of outcomes.
We exhibit our techniques in various simulations in Section 4. Computationally our procedures have comparable cost to point estimation of the empirical CDF, as the empirical CDF is a sufficient statistic.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Block et al. [2022] Adam Block, Yuval Dagan, Noah Golowich, and Alexander Rakhlin. Smoothed online learning is as easy as statistical learning. ar Xiv preprint ar Xiv:2202.04690 , 2022.
- 2Cantelli [1933] Francesco Paolo Cantelli. Sulla determinazione empirica delle leggi di probabilita. Giorn. Ist. Ital. Attuari , 4(421-424), 1933.
- 3Chandak et al. [2021] Yash Chandak, Scott Niekum, Bruno da Silva, Erik Learned-Miller, Emma Brunskill, and Philip S Thomas. Universal off-policy evaluation. Advances in Neural Information Processing Systems , 34:27475–27490, 2021.
- 4Chatzigeorgiou [2013] Ioannis Chatzigeorgiou. Bounds on the lambert function and their application to the outage analysis of user cooperation. IEEE Communications Letters , 17(8):1505–1508, 2013.
- 5Dvoretzky et al. [1956] Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics , pages 642–669, 1956.
- 6Fan et al. [2015] Xiequan Fan, Ion Grama, and Quansheng Liu. Exponential inequalities for martingales with applications. Electronic Journal of Probability , 20:1–22, 2015.
- 7Feller [1958] William Feller. An introduction to probability theory and its applications, 3rd edition . Wiley series in probability and mathematical statistics, 1958.
- 8Glivenko [1933] Valery Glivenko. Sulla determinazione empirica delle leggi di probabilita. Gion. Ist. Ital. Attauri. , 4:92–99, 1933.
