Online Meta-Learning in Adversarial Multi-Armed Bandits
Ilya Osadchiy, Kfir Y. Levy, Ron Meir

TL;DR
This paper introduces a meta-learning algorithm for adversarial multi-armed bandits in an online-within-online setting, leveraging non-uniformity in the best arm distribution to improve regret bounds over traditional methods.
Contribution
It proposes a novel meta-learning approach with an inner and outer learner, achieving problem-dependent regret bounds that adapt to the distribution of the adversary's best arms.
Findings
Outperforms non-meta-learning algorithms when the best arm distribution is non-uniform.
Provides problem-dependent regret bounds that adapt to the empirical distribution.
Demonstrates improved regret bounds in adversarial multi-armed bandit scenarios.
Abstract
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per-episode best arm chosen by the adversary. We present an algorithm that can leverage the non-uniformity in this empirical distribution, and derive problem-dependent regret bounds. This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the episodes. In the case where the best arm distribution is far from uniform, it improves upon the best bound that can be achieved by any online algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Influenza Virus Research Studies · Machine Learning and Algorithms
