Pessimistic Model Selection for Offline Deep Reinforcement Learning
Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen

TL;DR
This paper introduces a pessimistic model selection method for offline deep reinforcement learning that provides theoretical guarantees and improves policy generalization in real-world applications.
Contribution
It proposes a novel pessimistic model selection framework with theoretical backing, addressing the challenge of model overfitting in offline DRL.
Findings
Outperforms existing model selection methods in numerical studies
Provides theoretical guarantees for policy optimality
Addresses bias in DRL model identification
Abstract
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a provably effective framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management
