Oracle Inequalities for Model Selection in Offline Reinforcement   Learning

Jonathan N. Lee; George Tucker; Ofir Nachum; Bo Dai; Emma Brunskill

arXiv:2211.02016·cs.LG·November 4, 2022·1 cites

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

PDF

Open Access 1 Video

TL;DR

This paper introduces ModBE, a new model selection algorithm for offline reinforcement learning that achieves near-optimal theoretical guarantees and effectively balances approximation and estimation errors.

Contribution

The paper presents the first minimax rate-optimal model selection algorithm for offline RL with value function approximation, using a novel elimination method.

Findings

01

ModBE achieves minimax rate-optimal oracle inequalities.

02

The algorithm is simple, computationally efficient, and relies on square loss regression.

03

Numerical simulations demonstrate effective model class selection.

Abstract

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors. The algorithm, ModBE, takes as input a collection of candidate model classes and a generic base offline RL algorithm. By successively eliminating model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Oracle Inequalities for Model Selection in Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Data Classification

MethodsBalanced Selection