PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

Hamish Flynn; David Reeb; Melih Kandemir; Jan Peters

arXiv:2203.03303·cs.LG·March 17, 2022

PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters

PDF

TL;DR

This paper introduces a PAC-Bayesian framework for lifelong learning in multi-armed bandit problems, deriving bounds and algorithms that improve transfer learning across sequential tasks.

Contribution

It provides the first PAC-Bayesian analysis tailored for lifelong learning in multi-armed bandits, with new bounds and algorithms leveraging these bounds for better transfer learning.

Findings

01

Proposed algorithms outperform baseline methods in experiments.

02

Derived lower bounds on expected rewards in lifelong bandit settings.

03

Validated effectiveness of bounds-based algorithms across multiple tasks.

Abstract

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.