Model Selection for Generic Reinforcement Learning

Avishek Ghosh; Sayak Ray Chowdhury; Kannan Ramchandran

arXiv:2107.05849·stat.ML·December 13, 2021

Model Selection for Generic Reinforcement Learning

Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

PDF

Open Access

TL;DR

This paper introduces exttt{ARL-GEN}, an adaptive algorithm for model selection in finite horizon episodic reinforcement learning, achieving near-oracle regret bounds without prior knowledge of the true model class.

Contribution

The paper proposes a novel adaptive RL algorithm that automatically selects the correct model class, matching oracle regret bounds, and extends to linear mixture MDPs without separability assumptions.

Findings

01

exttt{ARL-GEN} achieves regret scaling similar to an oracle with known model.

02

The algorithm adapts to the smallest true model class with high probability.

03

Regret bounds are nearly independent of the total number of steps $T$.

Abstract

We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel $P^{*}$ belongs to a family of models $P^{*}$ with finite metric entropy. In the model selection framework, instead of $P^{*}$ , we are given $M$ nested families of transition kernels $\cP_{1} \subset \cP_{2} \subset \dots \subset \cP_{M}$ . We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^{*}$ lies. \texttt{ARL-GEN} uses the Upper Confidence Reinforcement Learning (\texttt{UCRL}) algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that \texttt{ARL-GEN} obtains a regret…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms