Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits
Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

TL;DR
This paper introduces an online hyperparameter tuning framework for contextual bandits that adaptively learns optimal parameters in real-time, improving performance without offline tuning or pre-specified candidate sets.
Contribution
It proposes the first online continuous hyperparameter tuning method for contextual bandits using a double-layer bandit framework called CDT, with theoretical regret guarantees.
Findings
Achieves sublinear regret in theory.
Outperforms existing methods on synthetic datasets.
Demonstrates consistent improvement on real datasets.
Abstract
In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on the values of hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasible to use offline tuning methods like cross-validation to choose hyperparameters under the bandit environment, as the decisions should be made in real-time. To address this challenge, we propose the first online continuous hyperparameter tuning framework for contextual bandits to learn the optimal parameter configuration in practice within a search space on the fly. Specifically, we use a double-layer bandit framework named CDT (Continuous Dynamic Tuning) and formulate the hyperparameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification
MethodsSpatio-temporal stability analysis
