SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning
Zhongjian Qiao, Jiafei Lyu, Kechen Jiao, Qi Liu, Xiu Li

TL;DR
SUMO introduces a search-based uncertainty estimation method for model-based offline reinforcement learning, improving the reliability of synthetic samples and enhancing overall algorithm performance.
Contribution
The paper proposes SUMO, a novel search-based uncertainty estimation technique that outperforms ensemble methods in model-based offline RL.
Findings
SUMO provides more accurate uncertainty estimates than ensemble methods.
Integrating SUMO boosts the performance of algorithms like MOPO and AMOReL.
Experimental results on D4RL datasets validate SUMO's effectiveness.
Abstract
The performance of offline reinforcement learning (RL) suffers from the limited size and quality of static datasets. Model-based offline RL addresses this issue by generating synthetic samples through a dynamics model to enhance overall performance. To evaluate the reliability of the generated samples, uncertainty estimation methods are often employed. However, model ensemble, the most commonly used uncertainty estimation method, is not always the best choice. In this paper, we propose a \textbf{S}earch-based \textbf{U}ncertainty estimation method for \textbf{M}odel-based \textbf{O}ffline RL (SUMO) as an alternative. SUMO characterizes the uncertainty of synthetic samples by measuring their cross entropy against the in-distribution dataset samples, and uses an efficient search-based method for implementation. In this way, SUMO can achieve trustworthy uncertainty estimation. We integrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Reliability and Analysis Research · Advanced Multi-Objective Optimization Algorithms
MethodsBalanced Selection
