Meta-Learning Adversarial Bandits
Maria-Florina Balcan, Keegan Harris, Mikhail Khodak, Zhiwei Steven Wu

TL;DR
This paper introduces a meta-learning algorithm for adversarial bandit problems across multiple tasks, improving average performance by adapting to task similarities in both multi-armed bandits and bandit linear optimization settings.
Contribution
It is the first to address adversarial bandit learning with meta-algorithms, providing setting-specific guarantees and adaptive regret bounds for MAB and BLO.
Findings
Meta-algorithm improves task-averaged regret when task similarities are high.
Adaptive guarantees depend on measures of task similarity and regularizer properties.
Unregularized follow-the-leader with multiplicative weights effectively bounds regret.
Abstract
We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a unified meta-algorithm that yields setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-algorithm tunes the initialization, step-size, and entropy parameter of the Tsallis-entropy generalization of the well-known Exp3 method, with the task-averaged regret provably improving if the entropy of the distribution over estimated optima-in-hindsight is small. For BLO, we learn the initialization, step-size, and boundary-offset of online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning
