Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals
Emilie Kaufmann (Scool, CNRS), Wouter Koolen (CWI)

TL;DR
This paper develops new uniform-in-time deviation inequalities using mixture martingales for adaptive sampling in multi-armed bandits, enabling improved sequential testing and confidence interval construction.
Contribution
It introduces a novel approach using hierarchical prior-based mixture martingales to derive deviation inequalities applicable to multiple arms and adaptive sampling.
Findings
Derived deviation inequalities valid uniformly over time
Enabled analysis of stopping rules based on generalized likelihood ratios
Constructed tight confidence intervals for functions of arm means
Abstract
This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and may take into account several arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Statistical Methods and Inference
