Pessimistic Model Selection for Offline Deep Reinforcement Learning

Chao-Han Huck Yang; Zhengling Qi; Yifan Cui; Pin-Yu Chen

arXiv:2111.14346·cs.LG·November 30, 2021·1 cites

Pessimistic Model Selection for Offline Deep Reinforcement Learning

Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen

PDF

Open Access

TL;DR

This paper introduces a pessimistic model selection method for offline deep reinforcement learning that provides theoretical guarantees and improves policy generalization in real-world applications.

Contribution

It proposes a novel pessimistic model selection framework with theoretical backing, addressing the challenge of model overfitting in offline DRL.

Findings

01

Outperforms existing model selection methods in numerical studies

02

Provides theoretical guarantees for policy optimality

03

Addresses bias in DRL model identification

Abstract

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a provably effective framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management