DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

Xiao-Yin Liu; Xiao-Hu Zhou; Mei-Jiang Gui; Guo-Tao Li; Xiao-Liang Xie; Shi-Qi Liu; Shuang-Yi Wang; Qi-Chao Zhang; Biao Luo; Zeng-Guang Hou

arXiv:2309.08925·cs.LG·June 10, 2025·1 cites

DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

Xiao-Yin Liu, Xiao-Hu Zhou, Mei-Jiang Gui, Guo-Tao Li, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Qi-Chao Zhang, Biao Luo, Zeng-Guang Hou

PDF

Open Access

TL;DR

This paper introduces DOMAIN, a model-based offline RL algorithm that avoids unreliable uncertainty estimation by adaptively sampling model data, leading to improved performance and safety guarantees.

Contribution

DOMAIN is the first offline RL method that adaptively adjusts model data penalties without relying on uncertainty estimation, enhancing safety and performance.

Findings

01

DOMAIN outperforms prior algorithms on D4RL benchmark

02

Theoretical guarantees show lower bound of true Q value outside the region

03

Performance improved by 1.8% on average

Abstract

Model-based reinforcement learning (RL), which learns an environment model from the offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. To address the above issues, this paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty, and designs the adaptive sampling distribution of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics