Non-stationary Bandits and Meta-Learning with a Small Set of Optimal   Arms

MohammadJavad Azizi; Thang Duong; Yasin Abbasi-Yadkori; Andr\'as; Gy\"orgy; Claire Vernade; Mohammad Ghavamzadeh

arXiv:2202.13001·cs.LG·October 20, 2022

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, Andr\'as, Gy\"orgy, Claire Vernade, Mohammad Ghavamzadeh

PDF

Open Access 1 Repo

TL;DR

This paper introduces algorithms for non-stationary and meta-learning bandit problems that outperform standard methods, especially when the number of optimal arms is small relative to total arms, with regret bounds tailored to different settings.

Contribution

The paper proposes a reduction-based algorithm for non-stationary and meta-learning bandits that achieves improved regret bounds in regimes with few optimal arms.

Findings

01

Regret bounds are smaller than the baseline of O(\u221a{KNT}) in large task regimes.

02

For fixed task length, regret is bounded by O(NM77Md77) and improved to O(N77+N^{1/2}77) under additional assumptions.

Abstract

We study a sequential decision problem where the learner faces a sequence of $K$ -armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M \leq K$ , the learner aims to compete with the best subset of arms of size $M$ . We design an algorithm based on a reduction to bandit submodular maximization, and show that, for $T$ rounds comprised of $N$ tasks, in the regime of large number of tasks and small number of optimal arms $M$ , its regret in both settings is smaller than the simple baseline of $\tilde{O} (K N T)$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $τ$ , we show that the regret of the algorithm is bounded as $\tilde{O} (N M M τ + N^{2/3} M τ)$ . Under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

duongnhatthang/meta-bandit
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms