Pessimism for Offline Linear Contextual Bandits using $\ell_p$   Confidence Sets

Gene Li; Cong Ma; Nathan Srebro

arXiv:2205.10671·cs.LG·October 6, 2022

Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets

Gene Li, Cong Ma, Nathan Srebro

PDF

Open Access 1 Video

TL;DR

This paper introduces a family of pessimistic offline learning algorithms for linear contextual bandits based on $\, ext{ell}_p$ confidence sets, highlighting a new $\, ext{ell}_\, ext{infty}$ variant that is adaptively optimal.

Contribution

It proposes a novel $\, ext{ell}_\, ext{infty}$ confidence set-based learning rule that outperforms existing methods in linear contextual bandit offline learning.

Findings

01

The $\, ext{ell}_\, ext{infty}$ rule is adaptively minimax optimal.

02

The $\, ext{ell}_\, ext{infty}$ rule dominates other predictors in the family.

03

The approach generalizes lower confidence bounds to the linear setting.

Abstract

We present a family ${\overset{π}{^}}_{p \geq 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $ℓ_{p}$ norms, where $\overset{π}{^}_{2}$ corresponds to Bellman-consistent pessimism (BCP), while $\overset{π}{^}_{\infty}$ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel $\overset{π}{^}_{\infty}$ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all $ℓ_{q}$ -constrained problems, and as such it strictly dominates all other predictors in the family, including $\overset{π}{^}_{2}$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pessimism for Offline Linear Contextual Bandits using $\ell_p$ Confidence Sets· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning