Scalable Representation Learning in Linear Contextual Bandits with   Constant Regret Guarantees

Andrea Tirinzoni; Matteo Papini; Ahmed Touati; Alessandro Lazaric,; Matteo Pirotta

arXiv:2210.13083·cs.LG·October 25, 2022·1 cites

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric,, Matteo Pirotta

PDF

Open Access 1 Video

TL;DR

This paper introduces BanditSRL, a scalable representation learning algorithm for linear contextual bandits that achieves constant regret by combining spectral property optimization with a likelihood ratio test, and demonstrates its effectiveness with neural networks.

Contribution

It proposes BanditSRL, a novel algorithm that learns spectral-optimized representations with a constrained optimization and exploits them with a likelihood ratio test, achieving horizon-independent regret.

Findings

01

BanditSRL achieves constant regret when HLS representations are available.

02

Regularizing neural networks towards HLS representations improves performance.

03

BanditSRL can be combined with any no-regret algorithm for effective exploration.

Abstract

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BanditSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Data Stream Mining Techniques

MethodsTest