Spectral Thompson sampling
Tomas Kocak, Michal Valko, Remi Munos, Shipra Agrawal

TL;DR
SpectralTS is a computationally efficient bandit algorithm leveraging graph smoothness, achieving regret bounds comparable to traditional methods, with demonstrated effectiveness on synthetic and real-world data.
Contribution
We introduce SpectralTS, a spectral bandit algorithm that scales well with large choice sets by exploiting graph structure, providing theoretical regret bounds and empirical validation.
Findings
Regret scales as d*sqrt(T ln N), matching known results.
SpectralTS outperforms traditional algorithms in large-scale settings.
Effective on both synthetic and real-world datasets.
Abstract
Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
