Balancing Performance and Costs in Best Arm Identification
Michael O. Harding, Kirthevasan Kandasamy

TL;DR
This paper introduces a new framework for best arm identification in multi-armed bandits that balances performance and costs, providing theoretical bounds and an algorithm that outperforms traditional methods in practical scenarios.
Contribution
It proposes a novel risk-based formalism for best arm identification that explicitly balances performance and costs, along with a matching algorithm and theoretical bounds.
Findings
Theoretical lower bounds for risk in new formalism
DBCARE algorithm matches bounds up to polylog factors
Demonstrates improved performance over classical methods in simulations
Abstract
We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner's goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g. maximizing profit in an A/B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, Economics, and Judicial Systems
