Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric,, Matteo Pirotta

TL;DR
This paper introduces BanditSRL, a scalable representation learning algorithm for linear contextual bandits that achieves constant regret by combining spectral property optimization with a likelihood ratio test, and demonstrates its effectiveness with neural networks.
Contribution
It proposes BanditSRL, a novel algorithm that learns spectral-optimized representations with a constrained optimization and exploits them with a likelihood ratio test, achieving horizon-independent regret.
Findings
BanditSRL achieves constant regret when HLS representations are available.
Regularizing neural networks towards HLS representations improves performance.
BanditSRL can be combined with any no-regret algorithm for effective exploration.
Abstract
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BanditSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Data Stream Mining Techniques
MethodsTest
