Thompson Sampling Algorithms for Cascading Bandits
Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

TL;DR
This paper provides the first theoretical analysis of Thompson Sampling algorithms for cascading bandits, offering tighter regret bounds, new algorithm designs, and empirical evidence of superior performance over UCB methods.
Contribution
It introduces and analyzes Thompson Sampling algorithms for cascading bandits, including a Gaussian update variant and a linear generalization, with proven regret bounds and empirical validation.
Findings
TS algorithms outperform UCB algorithms in experiments.
New regret bounds are tighter than previous results.
Linear TS algorithm scales efficiently with model dimension.
Abstract
Motivated by the pressing need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015). While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter. In this paper, we first provide a problem-dependent upper bound on the regret of a TS algorithm with Beta-Bernoulli updates; this upper bound is tighter than a recent derivation under a more general setting by Huyuk and Tekin (2019). Next, we design and analyze another TS algorithm with Gaussian updates, TS-Cascade. TS-Cascade achieves the state-of-the-art regret bound for cascading bandits. Complementarily, we consider a linear generalization of the cascading bandit model, which allows efficient learning in large cascading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
