Thompson Sampling for Robust Transfer in Multi-Task Bandits

Zhi Wang; Chicheng Zhang; Kamalika Chaudhuri

arXiv:2206.08556·cs.LG·June 20, 2022

Thompson Sampling for Robust Transfer in Multi-Task Bandits

Zhi Wang, Chicheng Zhang, Kamalika Chaudhuri

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Thompson Sampling-based algorithm for multi-task bandit problems, demonstrating its near-optimal theoretical performance and superior empirical results over UCB-based methods and non-transfer baselines.

Contribution

It extends Thompson Sampling to multi-task bandits with a novel analysis, showing near-optimal guarantees and improved empirical performance.

Findings

01

Thompson Sampling achieves near-optimal performance in multi-task bandits.

02

The proposed algorithm outperforms UCB-based and non-transfer baselines.

03

A new concentration inequality for multi-task data aggregation is developed.

Abstract

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiwang123/eps-mpmab-ts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms

MethodsSpatio-temporal stability analysis