Asymptotically Optimal Bandits under Weighted Information

Matias I. M\"uller; Cristian R. Rojas

arXiv:2105.14114·cs.LG·June 1, 2021

Asymptotically Optimal Bandits under Weighted Information

Matias I. M\"uller, Cristian R. Rojas

PDF

Open Access

TL;DR

This paper introduces a new multi-armed bandit model where multiple arms are played simultaneously with variable power, enabling faster information gathering and achieving logarithmic regret bounds, unlike traditional linear bandits.

Contribution

It proposes Weighted Thompson Sampling, a novel strategy that optimally allocates resources based on posterior beliefs, and derives tight bounds showing its effectiveness in this new setting.

Findings

01

Achieves a tight (\, ext{log}(T)") lower bound on regret.

02

Demonstrates the strategy's optimality with matching upper bounds.

03

Applies the method to control and system identification problems.

Abstract

We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a normalized power profile and receives a Gaussian vector as outcome, where the unknown variance of each sample is inversely proportional to the power allocated to that arm. The reward corresponds to a linear combination of the power profile and the outcomes, resembling a linear bandit. By spreading the power, the agent can choose to collect information much faster than in a traditional multi-armed bandit at the price of reducing the accuracy of the samples. This setup is fundamentally different from that of a linear bandit -- the regret is known to scale as $Θ (T)$ for linear bandits, while in this setup the agent receives a much more detailed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Model Reduction and Neural Networks