Model Selection for Generic Contextual Bandits

Avishek Ghosh; Abishek Sankararaman; Kannan Ramchandran

arXiv:2107.03455·stat.ML·July 21, 2023

Model Selection for Generic Contextual Bandits

Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

PDF

Open Access

TL;DR

This paper introduces an adaptive algorithm for model selection in stochastic contextual bandits that automatically adjusts to the true model class, achieving near-optimal regret without prior knowledge of the class.

Contribution

The paper proposes the Adaptive Contextual Bandit (ACB) algorithm that adaptively eliminates overly simple models, matching the regret of algorithms with known model classes, and provides specialized algorithms for linear cases.

Findings

01

ACB achieves regret comparable to algorithms with known model class.

02

A simpler explore-then-commit algorithm also attains similar regret bounds.

03

Specialized algorithms for linear bandits offer sharper guarantees.

Abstract

We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm (ex. \cite{falcon}), that needs the knowledge of the true model class. The price of not knowing the correct model class turns out to be only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We also show that a much simpler explore-then-commit (ETC) style algorithm also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Smart Grid Energy Management

MethodsMulti-Head Attention · Softmax · Linear Layer · Attention Is All You Need · InfoNCE · Residual Connection · Layer Normalization · Relative Position Encodings · Position-Wise Feed-Forward Layer · Global-Local Attention