No Regret Bound for Extreme Bandits
Robert Nishihara, David Lopez-Paz, L\'eon Bottou

TL;DR
This paper investigates the extreme bandit problem, where the goal is to select distributions to optimize the best possible outcome, and proves that no policy can asymptotically achieve zero extreme regret.
Contribution
The paper introduces the concept of extreme regret for the extreme bandit setting and proves the fundamental limitation that no policy can asymptotically attain no extreme regret.
Findings
No policy can asymptotically achieve zero extreme regret.
The extreme bandit setting differs fundamentally from standard bandits.
Multiple sensible oracle models exist for the extreme bandit problem.
Abstract
Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with sequentially choosing which distribution from a collection to sample in order to minimize (maximize) the single best cost (reward). Whereas the distributions in the standard bandit setting are primarily characterized by their means, a number of subtleties arise when we care about the minimal cost as opposed to the average cost. For example, there may not be a well-defined "best" distribution as there is in the standard bandit setting. The best distribution depends on the rewards that have been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Machine Learning and Data Classification
