On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen; Yuchen He; Chihao Zhang

arXiv:2307.07264·cs.LG·August 7, 2023·1 cites

On Interpolating Experts and Multi-Armed Bandits

Houshuang Chen, Yuchen He, Chihao Zhang

PDF

Open Access

TL;DR

This paper introduces a unified framework interpolating expert advice and multi-armed bandits, providing tight regret bounds and optimal algorithms for both regret minimization and pure exploration in this setting.

Contribution

It develops tight minimax regret bounds for the interpolated bandit problem and designs an optimal PAC algorithm for pure exploration, extending results to graph feedback scenarios.

Findings

01

Minimax regret for $ extbf{m}$-MAB is $ heta(\sqrt{T\sum_{k=1}^K\log(m_k+1)})$

02

Optimal PAC algorithm for $ extbf{m}$-BAI with sample complexity $ heta(rac{1}{\epsilon^2}\sum_{k=1}^K\log(m_k+1))$

03

Extensions to bandit with graph feedback yield tight bounds for various feedback graph families.

Abstract

Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $m = (m_{1}, \dots, m_{K}) \in N^{K}$ , an instance of $m$ -MAB indicates that the arms are partitioned into $K$ groups and the $i$ -th group contains $m_{i}$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $m$ -MAB and design an optimal PAC algorithm for its pure exploration version, $m$ -BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $m$ -MAB is $Θ (T \sum_{k = 1}^{K} lo g (m_{k} + 1))$ and the minimum number of pulls for an $(ϵ, 0.05)$ -PAC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms