Model-based Offline Reinforcement Learning with Count-based Conservatism
Byeongchan Kim, Min-hwan Oh

TL;DR
This paper introduces Count-MORL, a model-based offline reinforcement learning algorithm that uses count-based conservatism to improve policy performance and provides theoretical guarantees, validated by experiments on benchmark datasets.
Contribution
It is the first to demonstrate the effectiveness of count-based conservatism in model-based offline deep RL, with theoretical analysis and practical validation.
Findings
Count-MORL outperforms existing offline RL algorithms on D4RL benchmarks.
Estimation error is inversely proportional to state-action visit frequency.
Policy under count-based conservatism achieves near-optimality guarantees.
Abstract
In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named . Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Machine Learning and ELM
