Offline-to-online hyperparameter transfer for stochastic bandits

Dravyansh Sharma; Arun Sai Suggala

arXiv:2501.02926·cs.LG·January 7, 2025

Offline-to-online hyperparameter transfer for stochastic bandits

Dravyansh Sharma, Arun Sai Suggala

PDF

Open Access 1 Video

TL;DR

This paper develops a transfer learning approach to tune hyperparameters for stochastic bandit algorithms using offline data from related tasks, reducing online tuning complexity.

Contribution

It provides theoretical bounds and practical methods for transferring hyperparameters across tasks in stochastic bandits, addressing a key challenge in online learning.

Findings

01

Bounds on sample complexity for hyperparameter transfer

02

Effective transfer improves online bandit performance

03

Applicable to UCB, LinUCB, GP-UCB algorithms

Abstract

Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However, this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically relevant transfer learning setting where one has access to offline data collected from several bandit problems (tasks) coming from an unknown distribution over the tasks. Our aim is to use this offline data to set the hyperparameters for a new task drawn from the unknown distribution. We provide bounds on the inter-task (number of tasks) and intra-task (number of arm pulls for each task) sample complexity for learning near-optimal hyperparameters on unseen tasks drawn from the distribution. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Offline-to-Online Hyperparameter Transfer for Stochastic Bandits· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Air Quality Monitoring and Forecasting · Data Stream Mining Techniques

MethodsSparse Evolutionary Training