Learning Classical Planning Strategies with Policy Gradient
Pawel Gomoluch, Dalal Alrajeh, Alessandra Russo

TL;DR
This paper introduces a trainable framework that learns to select among multiple classical planning search strategies using policy gradient, resulting in improved planning performance tailored to specific problem distributions.
Contribution
The paper presents a novel approach that dynamically chooses among different search strategies with a learned policy, enhancing classical planning efficiency.
Findings
Learns domain-specific search strategies
Outperforms fixed best-first search
Improves IPC score across domains
Abstract
A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner's state. Then, we train the system on randomly generated problems from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
