Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits
Brian Cho, Dominik Meier, Kyra Gan, Nathan Kallus

TL;DR
This paper introduces a minimax optimal method for identifying arms with means above a threshold in nonparametric multi-armed bandits, balancing reward maximization and pure exploration efficiently.
Contribution
It develops a reward-maximizing sampling algorithm combined with a novel nonparametric sequential test that achieves minimax optimal stopping times under error constraints.
Findings
Achieves minimax optimal error control with nonparametric tests.
Reduces sample complexity by at least 50% in experiments.
Validates approach on synthetic and real-world data.
Abstract
In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickly as possible. We show that GAI can be efficiently solved by combining a reward-maximizing sampling algorithm with a novel nonparametric anytime-valid sequential test for labeling arm means. We first establish that our sequential test maintains error control under highly nonparametric assumptions and asymptotically achieves the minimax optimal e-power, a notion of power for anytime-valid tests. Next, by pairing regret-minimizing sampling schemes with our sequential test, we provide an approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Forecasting Techniques and Applications · Advanced Statistical Process Monitoring
MethodsFocus
