Meta-Learning Adversarial Bandit Algorithms
Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan,, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu

TL;DR
This paper introduces meta-learning algorithms for adversarial bandit problems, improving task-specific performance by tuning hyperparameters in multi-armed bandits and bandit linear optimization settings.
Contribution
It is the first to address adversarial online meta-learning with bandit feedback, designing algorithms that tune hyperparameters for improved regret in MAB and BLO.
Findings
Meta-learners improve regret when optima-in-hindsight entropy is small.
Task-averaged regret depends on action space measures.
Hyperparameter tuning via low-dimensional affine functions is effective.
Abstract
We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
