Model-based Offline Reinforcement Learning with Local Misspecification

Kefan Dong; Yannis Flet-Berliac; Allen Nie; Emma Brunskill

arXiv:2301.11426·cs.LG·January 30, 2023

Model-based Offline Reinforcement Learning with Local Misspecification

Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

PDF

Open Access

TL;DR

This paper introduces a new model-based offline reinforcement learning approach that explicitly accounts for model misspecification and distribution mismatch, providing theoretical guarantees and an empirical algorithm for policy selection.

Contribution

It presents a novel lower bound on policy performance considering model misspecification and proposes an empirical algorithm for optimal offline policy selection.

Findings

01

Proves a safe policy improvement theorem with pessimism approximations.

02

Analyzes the lower bound in the LQR setting.

03

Demonstrates competitive performance on D4RL tasks.

Abstract

We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection. Theoretically, we prove a novel safe policy improvement theorem by establishing pessimism approximations to the value function. Our key insight is to jointly consider selecting over dynamics models and policies: as long as a dynamics model can accurately represent the dynamics of the state-action pairs visited by a given policy, it is possible to approximate the value of that particular policy. We analyze our lower bound in the LQR setting and also show competitive performance to previous lower bounds on policy selection across a set of D4RL tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research