Learning Classical Planning Strategies with Policy Gradient

Pawel Gomoluch; Dalal Alrajeh; Alessandra Russo

arXiv:1810.09923·cs.AI·April 12, 2019

Learning Classical Planning Strategies with Policy Gradient

Pawel Gomoluch, Dalal Alrajeh, Alessandra Russo

PDF

TL;DR

This paper introduces a trainable framework that learns to select among multiple classical planning search strategies using policy gradient, resulting in improved planning performance tailored to specific problem distributions.

Contribution

The paper presents a novel approach that dynamically chooses among different search strategies with a learned policy, enhancing classical planning efficiency.

Findings

01

Learns domain-specific search strategies

02

Outperforms fixed best-first search

03

Improves IPC score across domains

Abstract

A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner's state. Then, we train the system on randomly generated problems from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.