Hypothesis Transfer in Bandits by Weighted Models
Steven Bilaj, Sofien Dhouib, Setareh Maghsudi

TL;DR
This paper introduces transfer learning techniques for contextual bandits using weighted models, improving exploration efficiency when leveraging prior models and adapting to multiple sources.
Contribution
It proposes a re-weighting scheme for transfer in bandits, extending to multiple source models and dynamic combinations, with theoretical guarantees and empirical validation.
Findings
Reduced regret compared to classic Linear UCB with transfer
Effective handling of multiple source models
Empirical results on simulated and real data confirm benefits
Abstract
We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning. That is, we assume having access to a previously learned model on an unobserved set of contexts, and we leverage it in order to accelerate exploration on a new bandit problem. Our transfer strategy is based on a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired, while recovering the classic regret rate when the two tasks are unrelated. We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step. Additionally we discuss an approach where a dynamic convex combination of source models is given in terms of a biased regularization term in the classic LinUCB algorithm. The algorithms and the theoretical analysis of our proposed methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
