Model Selection in Contextual Stochastic Bandit Problems
Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian, Zimmert, Tor Lattimore, Csaba Szepesvari

TL;DR
This paper introduces a meta-algorithm for model selection in stochastic contextual bandit problems, achieving optimal regret bounds and addressing various complex scenarios with theoretical guarantees.
Contribution
It develops a generic meta-algorithm framework with a novel smoothing transformation, providing optimal $O(\sqrt{T})$ guarantees and addressing multiple challenging bandit settings.
Findings
Achieves $O(\sqrt{T})$ model selection regret in stochastic contextual bandits.
Proves a lower bound showing $\Omega(\sqrt{T})$ regret is unavoidable.
Addresses model selection in misspecified linear bandits and reinforcement learning.
Abstract
We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial meta-algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has regret, in general it is impossible to get better than regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
