Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning
Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

TL;DR
This paper introduces a meta-gradient algorithm that adaptively tunes state-query probabilities in Dyna-style planning, enhancing sample efficiency and robustness in model-based reinforcement learning, especially under resource constraints and changing environments.
Contribution
The paper presents a novel online meta-gradient method for adaptive sampling in Dyna planning, improving efficiency and avoiding common pitfalls of traditional sampling strategies.
Findings
Meta-gradient method outperforms conventional sampling strategies.
Improves sample efficiency of the overall RL process.
Reduces issues like sampling inaccurate transitions and credit assignment stalls.
Abstract
We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Our study compares the aggregate, empirical performance of this meta-gradient method to baselines that employ conventional sampling strategies. Results indicate that our method improves efficiency of the planning process, which, as a consequence, improves the sample-efficiency of the overall learning process. On the whole, we observe that our meta-learned solutions avoid several pathologies of conventional planning approaches, such as sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games
