Leveraging Offline Data in Linear Latent Contextual Bandits
Chinmaya Kausik, Kevin Tan, Ambuj Tewari

TL;DR
This paper introduces algorithms for linear latent contextual bandits that effectively leverage offline data to improve online decision-making, with theoretical guarantees and practical validation on real datasets.
Contribution
It proposes the first end-to-end algorithms for linear latent bandits that handle uncountably many latent states, including an offline subspace learning method and two online algorithms with optimal regret bounds.
Findings
Offline subspace learning with provable guarantees
Online algorithms with minimax optimal regret bounds
Validated effectiveness on synthetic and real-world data
Abstract
Leveraging offline data is an attractive way to accelerate online sequential decision-making. However, it is crucial to account for latent states in users or environments in the offline data, and latent bandits form a compelling model for doing so. In this light, we design end-to-end latent bandit algorithms capable of handing uncountably many latent states. We focus on a linear latent contextual bandit a linear bandit where each user has its own high-dimensional reward parameter in , but reward parameters across users lie in a low-rank latent subspace of dimension . First, we provide an offline algorithm to learn this subspace with provable guarantees. We then present two online algorithms that utilize the output of this offline algorithm to accelerate online learning. The first enjoys …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Machine Learning and Data Classification
MethodsFocus
