Offline Model-Based Reinforcement Learning with Anti-Exploration
Padmanaba Srinivasan, William Knottenbelt

TL;DR
This paper introduces MoMo, a novel offline model-based reinforcement learning method that uses anti-exploration techniques to improve performance and stability without relying on large ensembles.
Contribution
MoMo extends anti-exploration to model-based RL, enabling effective out-of-distribution detection and uncertainty handling with fewer models.
Findings
MoMo outperforms prior methods on D4RL datasets.
Model-based MoMo achieves superior results compared to model-free variants.
Anti-exploration improves stability and performance in offline MBRL.
Abstract
Model-based reinforcement learning (MBRL) algorithms learn a dynamics model from collected data and apply it to generate synthetic trajectories to enable faster learning. This is an especially promising paradigm in offline reinforcement learning (RL) where data may be limited in quantity, in addition to being deficient in coverage and quality. Practical approaches to offline MBRL usually rely on ensembles of dynamics models to prevent exploitation of any individual model and to extract uncertainty estimates that penalize values in states far from the dataset support. Uncertainty estimates from ensembles can vary greatly in scale, making it challenging to generalize hyperparameters well across even similar tasks. In this paper, we present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL to the model-based space. We develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Reservoir Engineering and Simulation Methods · Robotic Path Planning Algorithms
