Model Selection in Reinforcement Learning with General Function   Approximations

Avishek Ghosh; Sayak Ray Chowdhury

arXiv:2207.02992·stat.ML·July 8, 2022

Model Selection in Reinforcement Learning with General Function Approximations

Avishek Ghosh, Sayak Ray Chowdhury

PDF

Open Access

TL;DR

This paper develops adaptive model selection algorithms for reinforcement learning environments like MABs and MDPs, which identify the correct function class among nested options, achieving near-oracle regret bounds with minimal additional cost.

Contribution

It introduces efficient algorithms for model selection in RL that adaptively find the smallest true function class, matching oracle performance under certain assumptions.

Findings

01

Regret bounds match oracle with known true class

02

Additive regret cost with logarithmic dependence on horizon

03

Algorithms adapt to the smallest true model class

Abstract

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know the function classes, denoted by $F$ and $M$ , where the true models -- reward generating function for MABs and and transition kernel for MDPs -- lie, respectively. Instead, we are given $M$ nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that \emph{adapt} to the smallest function class (among the nested $M$ classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques