The Pareto Frontier of model selection for general Contextual Bandits
Teodor V. Marinov, Julian Zimmert

TL;DR
This paper investigates the fundamental limits of model selection in general contextual bandits with nested policy classes, establishing a Pareto frontier that characterizes the trade-offs and impossibility results in achieving optimal guarantees.
Contribution
It provides a Pareto frontier of bounds for model selection in contextual bandits, proving that certain complexity trade-offs are unavoidable and resolving related open problems.
Findings
Optimal guarantees are unattainable even in stochastic regimes.
A Pareto frontier matching upper and lower bounds is established.
Unavoidable complexity trade-offs are characterized for general policy classes.
Abstract
Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off between complexity term and time: . We give a disappointing answer to this question. Even in the purely stochastic regime, the desired results are unobtainable. We present a Pareto frontier of up to logarithmic factors matching upper and lower bounds, thereby proving that an increase in the complexity term independent of is unavoidable for general policy classes. As…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
