Model Selection for Generic Reinforcement Learning
Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran

TL;DR
This paper introduces exttt{ARL-GEN}, an adaptive algorithm for model selection in finite horizon episodic reinforcement learning, achieving near-oracle regret bounds without prior knowledge of the true model class.
Contribution
The paper proposes a novel adaptive RL algorithm that automatically selects the correct model class, matching oracle regret bounds, and extends to linear mixture MDPs without separability assumptions.
Findings
exttt{ARL-GEN} achieves regret scaling similar to an oracle with known model.
The algorithm adapts to the smallest true model class with high probability.
Regret bounds are nearly independent of the total number of steps $T$.
Abstract
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem where the transition kernel belongs to a family of models with finite metric entropy. In the model selection framework, instead of , we are given nested families of transition kernels . We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel lies. \texttt{ARL-GEN} uses the Upper Confidence Reinforcement Learning (\texttt{UCRL}) algorithm with value targeted regression as a blackbox and puts a model selection module at the beginning of each epoch. Under a mild separability assumption on the model classes, we show that \texttt{ARL-GEN} obtains a regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
