Spectral bandits
Tom\'a\v{s} Koc\'ak, R\'emi Munos, Branislav Kveton, Shipra Agrawal, Michal Valko

TL;DR
This paper introduces algorithms for bandit problems on graphs with smooth payoffs, enabling efficient content recommendation by leveraging the graph structure to minimize regret.
Contribution
It proposes the concept of effective dimension and three scalable algorithms for graph-based bandit problems with smooth payoffs.
Findings
Algorithms scale linearly and sublinearly with effective dimension.
Good user preference estimations can be learned from limited node evaluations.
Experiments demonstrate effectiveness in content recommendation scenarios.
Abstract
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
