Optimal Learning for Structured Bandits

Bart P.G. Van Parys; Negin Golrezaei

arXiv:2007.07302·cs.LG·July 11, 2023

Optimal Learning for Structured Bandits

Bart P.G. Van Parys, Negin Golrezaei

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DUSA, a new algorithm for structured multi-armed bandits that exploits convex structural information to achieve near-optimal regret, outperforming classical methods that ignore such structure.

Contribution

The paper proposes DUSA, a novel, computationally feasible algorithm that matches the theoretical regret lower bound for various structured bandit problems, unifying and extending prior approaches.

Findings

01

DUSA achieves asymptotic minimal regret across multiple structured bandit settings.

02

It effectively exploits convex structural information to improve decision-making.

03

The approach unifies analysis for linear, Lipschitz, convex, and new structured bandits.

Abstract

We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain convex structural information regarding the reward distributions; that is, the decision-maker knows the reward distributions of the arms belong to a convex compact set. In the presence such structural information, they then would like to minimize their regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer minimal regret. As recently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/vanparys/dusa
noneOfficial

Videos

Optimal Learning for Structured Bandits· youtube

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems