The Pareto Frontier of model selection for general Contextual Bandits

Teodor V. Marinov; Julian Zimmert

arXiv:2110.13282·cs.LG·October 27, 2021

The Pareto Frontier of model selection for general Contextual Bandits

Teodor V. Marinov, Julian Zimmert

PDF

Open Access 1 Video

TL;DR

This paper investigates the fundamental limits of model selection in general contextual bandits with nested policy classes, establishing a Pareto frontier that characterizes the trade-offs and impossibility results in achieving optimal guarantees.

Contribution

It provides a Pareto frontier of bounds for model selection in contextual bandits, proving that certain complexity trade-offs are unavoidable and resolving related open problems.

Findings

01

Optimal guarantees are unattainable even in stochastic regimes.

02

A Pareto frontier matching upper and lower bounds is established.

03

Unavoidable complexity trade-offs are characterized for general policy classes.

Abstract

Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off $α \in [\frac{1}{2}, 1)$ between complexity term and time: $ln (∣ Π_{m} ∣)^{1 - α} T^{α}$ . We give a disappointing answer to this question. Even in the purely stochastic regime, the desired results are unobtainable. We present a Pareto frontier of up to logarithmic factors matching upper and lower bounds, thereby proving that an increase in the complexity term $ln (∣ Π_{m} ∣)$ independent of $T$ is unavoidable for general policy classes. As…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Pareto Frontier of model selection for general Contextual Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics