Optimal Learning for Structured Bandits
Bart P.G. Van Parys, Negin Golrezaei

TL;DR
This paper introduces DUSA, a new algorithm for structured multi-armed bandits that exploits convex structural information to achieve near-optimal regret, outperforming classical methods that ignore such structure.
Contribution
The paper proposes DUSA, a novel, computationally feasible algorithm that matches the theoretical regret lower bound for various structured bandit problems, unifying and extending prior approaches.
Findings
DUSA achieves asymptotic minimal regret across multiple structured bandit settings.
It effectively exploits convex structural information to improve decision-making.
The approach unifies analysis for linear, Lipschitz, convex, and new structured bandits.
Abstract
We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain convex structural information regarding the reward distributions; that is, the decision-maker knows the reward distributions of the arms belong to a convex compact set. In the presence such structural information, they then would like to minimize their regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer minimal regret. As recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Optimal Learning for Structured Bandits· youtube
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems
