Learning Heuristic Selection with Dynamic Algorithm Configuration

David Speck; Andr\'e Biedenkapp; Frank Hutter; Robert Mattm\"uller,; Marius Lindauer

arXiv:2006.08246·cs.AI·April 13, 2021

Learning Heuristic Selection with Dynamic Algorithm Configuration

David Speck, Andr\'e Biedenkapp, Frank Hutter, Robert Mattm\"uller,, Marius Lindauer

PDF

1 Repo

TL;DR

This paper introduces a reinforcement learning-based method for dynamic heuristic selection in planning, leveraging internal search dynamics to outperform existing static and adaptive approaches.

Contribution

It presents a novel approach using reinforcement learning for dynamic heuristic selection that considers search dynamics, generalizes previous methods, and significantly improves performance.

Findings

01

Domain-wise learned policies outperform static heuristics.

02

The approach can exponentially improve search efficiency.

03

It generalizes over existing heuristic selection methods.

Abstract

A key challenge in satisficing planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most useful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

speckdavid/rl-plan
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDynamic Algorithm Configuration · Experience Replay · Dense Connections · Double Q-learning · Double DQN