World Model as a Graph: Learning Latent Landmarks for Planning
Lunjun Zhang, Ge Yang, Bradly C. Stadie

TL;DR
This paper introduces L3P, a novel graph-structured world model with latent landmarks for long-horizon planning in complex environments, combining model-based and model-free RL advantages.
Contribution
It proposes a new method to learn graph-based world models with latent landmarks and reachability estimates, enhancing planning capabilities in high-dimensional tasks.
Findings
L3P outperforms prior methods on various control tasks.
L3P effectively combines robustness of model-free RL with graph search generalization.
L3P enables scalable long-horizon planning in complex environments.
Abstract
Planning - the ability to analyze the structure of a problem in the large and decompose it into interrelated subproblems - is a hallmark of human intelligence. While deep reinforcement learning (RL) has shown great promise for solving relatively straightforward control tasks, it remains an open problem how to best incorporate planning into existing deep RL paradigms to handle increasingly complex environments. One prominent framework, Model-Based RL, learns a world model and plans using step-by-step virtual rollouts. This type of world model quickly diverges from reality when the planning horizon increases, thus struggling at long-horizon planning. How can we learn world models that endow agents with the ability to do temporally extended reasoning? In this work, we propose to learn graph-structured world models composed of sparse, multi-step transitions. We devise a novel algorithm to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Artificial Intelligence in Games
