Online Continuous Hyperparameter Optimization for Generalized Linear   Contextual Bandits

Yue Kang; Cho-Jui Hsieh; Thomas C. M. Lee

arXiv:2302.09440·cs.LG·April 9, 2024

Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits

Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

PDF

Open Access

TL;DR

This paper introduces an online hyperparameter tuning framework for contextual bandits that adaptively learns optimal parameters in real-time, improving performance without offline tuning or pre-specified candidate sets.

Contribution

It proposes the first online continuous hyperparameter tuning method for contextual bandits using a double-layer bandit framework called CDT, with theoretical regret guarantees.

Findings

01

Achieves sublinear regret in theory.

02

Outperforms existing methods on synthetic datasets.

03

Demonstrates consistent improvement on real datasets.

Abstract

In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on the values of hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasible to use offline tuning methods like cross-validation to choose hyperparameters under the bandit environment, as the decisions should be made in real-time. To address this challenge, we propose the first online continuous hyperparameter tuning framework for contextual bandits to learn the optimal parameter configuration in practice within a search space on the fly. Specifically, we use a double-layer bandit framework named CDT (Continuous Dynamic Tuning) and formulate the hyperparameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification

MethodsSpatio-temporal stability analysis