PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
Yuda Song, Wen Sun

TL;DR
This paper introduces a model-based RL algorithm that enhances exploration capabilities, guarantees polynomial sample complexity, and performs well across challenging and standard control tasks, including reward-free exploration.
Contribution
It presents a novel, efficient model-based RL algorithm with exploration guarantees applicable to KNR and linear MDPs, outperforming existing methods in exploration tasks.
Findings
Successfully handles exploration-challenging control tasks
Maintains high performance in dense reward benchmarks
Efficient reward-free exploration demonstrated
Abstract
Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL. However, existing empirical model-based RL approaches lack the ability to explore. This work studies a computationally and statistically efficient model-based algorithm for both Kernelized Nonlinear Regulators (KNR) and linear Markov Decision Processes (MDPs). For both models, our algorithm guarantees polynomial sample complexity and only uses access to a planning oracle. Experimentally, we first demonstrate the flexibility and efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. Finally, we demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
