Meta-Learning for Simple Regret Minimization
Mohammadjavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet, Katariya

TL;DR
This paper introduces novel meta-learning algorithms for simple regret minimization in bandit problems, providing theoretical guarantees and empirical evaluations for Bayesian and frequentist approaches.
Contribution
It presents the first Bayesian and frequentist meta-learning algorithms for simple regret minimization in bandits, with theoretical regret bounds and practical instantiations.
Findings
Bayesian meta-learning achieves regret of O(m / \u221a n)
Frequentist meta-learning achieves regret of O( m n + m / n)
Algorithms perform well across various bandit environments.
Abstract
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over bandit tasks with horizon is mere . On the other hand, the meta simple regret of the frequentist algorithm is . While its regret is worse, the frequentist algorithm is more general because it does not need a prior distribution over the meta-parameters. It can also be analyzed in more settings. We instantiate our algorithms for several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Data Classification · Machine Learning and Algorithms
