Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms
Qin Ding, Yue Kang, Yi-Wei Liu, Thomas C.M. Lee, Cho-Jui Hsieh, James, Sharpnack

TL;DR
This paper introduces Syndicated Bandits, a novel framework for automatically tuning multiple hyper-parameters in contextual bandit algorithms in real-time, avoiding exponential regret growth and achieving optimal performance.
Contribution
It proposes a general Syndicated Bandits framework for dynamic hyper-parameter tuning in contextual bandits, with proven regret bounds and broad applicability.
Findings
Regret bounds are derived for the framework.
The method avoids exponential regret dependence on hyper-parameters.
Experimental results validate effectiveness on synthetic and real data.
Abstract
The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics
