Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits
Anmol Kagrecha, Jayakrishnan Nair, and Krishna Jagannathan

TL;DR
This paper introduces statistically robust, risk-aware algorithms for best arm identification in multi-armed bandits that perform well under mild assumptions and are resistant to distributional misspecification.
Contribution
It establishes fundamental performance limits and proposes near-optimal algorithms for robust, risk-aware best arm identification under mild distributional assumptions.
Findings
Established performance bounds for robust algorithms.
Proposed two classes of near-optimal algorithms.
Unified framework for light-tailed and heavy-tailed distributions.
Abstract
Traditional multi-armed bandit (MAB) formulations usually make certain assumptions about the underlying arms' distributions, such as bounds on the support or their tail behaviour. Moreover, such parametric information is usually 'baked' into the algorithms. In this paper, we show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified. Our key contributions are twofold: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixed-budget pure exploration setting, and (ii) We propose two classes of algorithms that are asymptotically near-optimal. Additionally, we consider a risk-aware criterion for best arm identification, where the objective associated with each arm is a linear combination of the mean and the conditional value at risk (CVaR).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
