Early Stopping in Contextual Bandits and Inferences
Zihan Cui (University of Michigan)

TL;DR
This paper introduces early stopping rules for linear contextual bandits to reduce sampling costs and improve decision-making, while enabling reliable post-experiment inferences based on online estimators.
Contribution
It develops new stopping rules based on Opportunity Cost and Threshold Methods, integrating variance-based regret bounds and asymptotic distributions for stable, adaptive decision processes.
Findings
Proposed stopping rules effectively minimize in-experiment regret.
Method enables robust online statistical inference after stopping.
Batched estimators improve stability and asymptotic analysis.
Abstract
Bandit algorithms sequentially accumulate data using adaptive sampling policies, offering flexibility for real-world applications. However, excessive sampling can be costly, motivating the devolopment of early stopping methods and reliable post-experiment conditional inferences. This paper studies early stopping methods in linear contextual bandits, including both pre-determined and online stopping rules, to minimize in-experiment regrets while accounting for sampling costs. We propose stopping rules based on the Opportunity Cost and Threshold Method, utilizing the variances of unbiased or consistent online estimators to quantify the upper regret bounds of learned optimal policy. The study focuses on batched settings for stability, selecting a weighed combination of batched estimators as the online estimator and deriving its asymptotic distribution. Online statistical inferences are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics
MethodsEarly Stopping
