Deep Hierarchy in Bandits
Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, and, Mohammad Ghavamzadeh

TL;DR
This paper introduces a hierarchical Bayesian model for multi-armed bandits with complex reward correlations, proposing an efficient hierarchical Thompson sampling algorithm that leverages the structure to improve learning efficiency and regret bounds.
Contribution
It formulates a deep hierarchical Bayesian bandit model, develops an efficient implementation of HierTS, and provides theoretical regret analysis showing benefits of hierarchy in reducing regret.
Findings
HierTS efficiently implements hierarchical Bayesian inference for Gaussian hierarchies.
Regret decreases with prior width and hierarchy depth.
Empirical results confirm theoretical advantages in synthetic and real-world data.
Abstract
Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model with latent variables. Since the hierarchy can have multiple layers, we call it deep. We propose a hierarchical Thompson sampling algorithm (HierTS) for this problem, and show how to implement it efficiently for Gaussian hierarchies. The efficient implementation is possible due to a novel exact hierarchical representation of the posterior, which itself is of independent interest. We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Recommender Systems and Techniques
