A unified framework for bandit multiple testing
Ziyu Xu, Ruodu Wang, Aaditya Ramdas

TL;DR
This paper introduces a flexible, martingale-based framework for controlling the false discovery rate in bandit multiple hypothesis testing, accommodating complex dependencies, multiple queries, and various exploration strategies.
Contribution
It presents a unified, modular approach that guarantees FDR control in diverse bandit settings, extending beyond independent, sub-Gaussian assumptions.
Findings
Framework guarantees FDR control under broad conditions
Recovers sample complexity guarantees in classical settings
Performs comparably or better in practical experiments
Abstract
In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries). One common metric in non-bandit multiple testing is the false discovery rate (FDR). We propose a unified, modular framework for bandit FDR control that emphasizes the decoupling of exploration and summarization of evidence. We utilize the powerful martingale-based concept of "e-processes" to ensure FDR control for arbitrary composite nulls, exploration rules and stopping times in generic problem settings. In particular, valid FDR control holds even if the reward distributions of the arms could be dependent, multiple arms may be queried simultaneously, and multiple (cooperating or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Statistical Methods in Clinical Trials
