AI planning in the imagination: High-level planning on learned abstract search spaces
Carlos Martin, Tuomas Sandholm

TL;DR
This paper introduces PiZero, a novel planning method enabling reinforcement learning agents to perform high-level planning in learned abstract spaces, improving efficiency and generality across diverse environments.
Contribution
PiZero allows agents to plan in learned abstract spaces decoupled from the real environment, enabling high-level reasoning over compound actions and handling complex action spaces.
Findings
Outperforms prior methods in multiple domains
Handles continuous, combinatorial, and partial observability settings
Does not require environment simulators at execution time
Abstract
Search and planning algorithms have been a cornerstone of artificial intelligence since the field's inception. Giving reinforcement learning agents the ability to plan during execution time has resulted in significant performance improvements in various domains. However, in real-world environments, the model with respect to which the agent plans has been constrained to be grounded in the real environment itself, as opposed to a more abstract model which allows for planning over compound actions and behaviors. We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training, which is completely decoupled from the real environment. Unlike prior approaches, this enables the agent to perform high-level planning at arbitrary timescales and reason in terms of compound or temporally-extended actions, which can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Machine Learning and Algorithms
