Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets
Gene Li, Cong Ma, Nathan Srebro

TL;DR
This paper introduces a family of pessimistic offline learning algorithms for linear contextual bandits based on $\, ext{ell}_p$ confidence sets, highlighting a new $\, ext{ell}_\, ext{infty}$ variant that is adaptively optimal.
Contribution
It proposes a novel $\, ext{ell}_\, ext{infty}$ confidence set-based learning rule that outperforms existing methods in linear contextual bandit offline learning.
Findings
The $\, ext{ell}_\, ext{infty}$ rule is adaptively minimax optimal.
The $\, ext{ell}_\, ext{infty}$ rule dominates other predictors in the family.
The approach generalizes lower confidence bounds to the linear setting.
Abstract
We present a family of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different norms, where corresponds to Bellman-consistent pessimism (BCP), while is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all -constrained problems, and as such it strictly dominates all other predictors in the family, including .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
