Meta-Gradient Search Control: A Method for Improving the Efficiency of   Dyna-style Planning

Bradley Burega; John D. Martin; Luke Kapeluck; Michael Bowling

arXiv:2406.19561·cs.LG·July 1, 2024

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

PDF

Open Access

TL;DR

This paper introduces a meta-gradient algorithm that adaptively tunes state-query probabilities in Dyna-style planning, enhancing sample efficiency and robustness in model-based reinforcement learning, especially under resource constraints and changing environments.

Contribution

The paper presents a novel online meta-gradient method for adaptive sampling in Dyna planning, improving efficiency and avoiding common pitfalls of traditional sampling strategies.

Findings

01

Meta-gradient method outperforms conventional sampling strategies.

02

Improves sample efficiency of the overall RL process.

03

Reduces issues like sampling inaccurate transitions and credit assignment stalls.

Abstract

We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Our study compares the aggregate, empirical performance of this meta-gradient method to baselines that employ conventional sampling strategies. Results indicate that our method improves efficiency of the planning process, which, as a consequence, improves the sample-efficiency of the overall learning process. On the whole, we observe that our meta-learned solutions avoid several pathologies of conventional planning approaches, such as sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games