Thompson Sampling for Robust Transfer in Multi-Task Bandits
Zhi Wang, Chicheng Zhang, Kamalika Chaudhuri

TL;DR
This paper introduces a Thompson Sampling-based algorithm for multi-task bandit problems, demonstrating its near-optimal theoretical performance and superior empirical results over UCB-based methods and non-transfer baselines.
Contribution
It extends Thompson Sampling to multi-task bandits with a novel analysis, showing near-optimal guarantees and improved empirical performance.
Findings
Thompson Sampling achieves near-optimal performance in multi-task bandits.
The proposed algorithm outperforms UCB-based and non-transfer baselines.
A new concentration inequality for multi-task data aggregation is developed.
Abstract
We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms
MethodsSpatio-temporal stability analysis
