GuideBoot: Guided Bootstrap for Deep Contextual Bandits
Feiyang Pan, Haoming Li, Xiang Ao, Wei Wang, Yanrong Kang, Ao Tan and, Qing He

TL;DR
GuideBoot is a novel method that combines Bayesian guidance with deep bootstrap techniques to improve exploration in complex deep contextual bandit problems, demonstrating superior performance in synthetic and real-world advertising tasks.
Contribution
It introduces GuideBoot, a practical deep contextual bandit algorithm that guides exploration using noisy samples and uncertainty, bridging Bayesian and bootstrap approaches.
Findings
GuideBoot outperforms previous state-of-the-art methods in experiments.
It effectively balances exploration and exploitation in deep bandit settings.
The online version learns efficiently from streaming data.
Abstract
The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration with principled uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lacks clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits. In this paper, we introduce Guided Bootstrap (GuideBoot for short), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
