Leveraging Side Observations in Stochastic Bandits
Stephane Caron, Branislav Kveton, Marc Lelarge, Smriti Bhagat

TL;DR
This paper introduces algorithms for stochastic bandits with side observations, improving learning efficiency by exploiting relationships between actions, with applications demonstrated in social network-based content recommendation.
Contribution
It proposes new UCB-based algorithms that leverage side observations in stochastic bandits, providing improved regret bounds and practical benefits in social network scenarios.
Findings
Regret bounds are improved over standard models.
Experiments show 2.2x to 14x speedup in learning rate.
Algorithms effectively utilize side information in real social network data.
Abstract
This paper considers stochastic bandits with side observations, a model that accounts for both the exploration/exploitation dilemma and relationships between arms. In this setting, after pulling an arm i, the decision maker also observes the rewards for some other actions related to i. We will see that this model is suited to content recommendation in social networks, where users' reactions may be endorsed or not by their friends. We provide efficient algorithms based on upper confidence bounds (UCBs) to leverage this additional information and derive new bounds improving on standard regret guarantees. We also evaluate these policies in the context of movie recommendation in social networks: experiments on real datasets show substantial learning rate speedups ranging from 2.2x to 14x on dense networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
