Model-based Offline Reinforcement Learning with Count-based Conservatism

Byeongchan Kim; Min-hwan Oh

arXiv:2307.11352·cs.LG·July 24, 2023

Model-based Offline Reinforcement Learning with Count-based Conservatism

Byeongchan Kim, Min-hwan Oh

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Count-MORL, a model-based offline reinforcement learning algorithm that uses count-based conservatism to improve policy performance and provides theoretical guarantees, validated by experiments on benchmark datasets.

Contribution

It is the first to demonstrate the effectiveness of count-based conservatism in model-based offline deep RL, with theoretical analysis and practical validation.

Findings

01

Count-MORL outperforms existing offline RL algorithms on D4RL benchmarks.

02

Estimation error is inversely proportional to state-action visit frequency.

03

Policy under count-based conservatism achieves near-optimality guarantees.

Abstract

In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $Count-MORL$ . Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $Count-MORL$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oh-lab/count-morl
pytorchOfficial

Videos

Model-based Offline Reinforcement Learning with Count-based Conservatism· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Machine Learning and ELM