Meta-learning with Stochastic Linear Bandits

Leonardo Cella; Alessandro Lazaric; Massimiliano Pontil

arXiv:2005.08531·stat.ML·May 19, 2020·5 cites

Meta-learning with Stochastic Linear Bandits

Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

PDF

Open Access 1 Video

TL;DR

This paper explores meta-learning in stochastic linear bandits, proposing regularized algorithms that leverage task similarities to improve regret minimization across multiple tasks.

Contribution

It introduces a bias-regularized OFUL algorithm for meta-learning in bandits and proposes strategies to estimate this bias, demonstrating advantages with multiple tasks.

Findings

01

Bias-regularized OFUL improves regret in multi-task settings.

02

Estimating the bias enhances learning efficiency.

03

Strategies outperform isolated learning as task number increases.

Abstract

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Meta-learning with Stochastic Linear Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification