Model Selection in Contextual Stochastic Bandit Problems

Aldo Pacchiano; My Phan; Yasin Abbasi-Yadkori; Anup Rao; Julian; Zimmert; Tor Lattimore; Csaba Szepesvari

arXiv:2003.01704·cs.LG·December 6, 2022·23 cites

Model Selection in Contextual Stochastic Bandit Problems

Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian, Zimmert, Tor Lattimore, Csaba Szepesvari

PDF

Open Access

TL;DR

This paper introduces a meta-algorithm for model selection in stochastic contextual bandit problems, achieving optimal regret bounds and addressing various complex scenarios with theoretical guarantees.

Contribution

It develops a generic meta-algorithm framework with a novel smoothing transformation, providing optimal $O(\sqrt{T})$ guarantees and addressing multiple challenging bandit settings.

Findings

01

Achieves $O(\sqrt{T})$ model selection regret in stochastic contextual bandits.

02

Proves a lower bound showing $\Omega(\sqrt{T})$ regret is unavoidable.

03

Addresses model selection in misspecified linear bandits and reinforcement learning.

Abstract

We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial meta-algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O (T)$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O (lo g T)$ regret, in general it is impossible to get better than $Ω (T)$ regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics