Revisiting Design Choices in Offline Model-Based Reinforcement Learning
Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne,, Stephen J. Roberts

TL;DR
This paper critically examines various heuristics for uncertainty in offline model-based reinforcement learning, revealing how hyperparameter optimization can significantly enhance performance over existing methods.
Contribution
It compares different uncertainty heuristics, introduces new protocols for their evaluation, and demonstrates the effectiveness of Bayesian Optimization in hyperparameter tuning for offline MBRL.
Findings
Bayesian Optimization outperforms hand-tuned hyperparameters.
Hyperparameters like model count and rollout horizon critically affect performance.
Selected hyperparameters lead to significantly stronger results.
Abstract
Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or unsafe online data collection. Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model. This typically involves constructing a probabilistic model, and using the model uncertainty to penalize rewards where there is insufficient data, solving for a pessimistic MDP that lower bounds the true MDP. Existing methods, however, exhibit a breakdown between theory and practice, whereby pessimistic return ought to be bounded by the total variation distance of the model from the true dynamics, but is instead implemented through a penalty based on estimated model uncertainty. This has spawned a variety of uncertainty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Data Classification
