Policy-Based Self-Competition for Planning Problems

Jonathan Pirnay; Quirin G\"ottl; Jakob Burger; Dominik Gerhard Grimm

arXiv:2306.04403·cs.LG·June 8, 2023·2 cites

Policy-Based Self-Competition for Planning Problems

Jonathan Pirnay, Quirin G\"ottl, Jakob Burger, Dominik Gerhard Grimm

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GAZ PTP, a novel planning algorithm that enhances single-player problem solving by incorporating self-competition with historical policies, leading to improved performance in combinatorial optimization tasks.

Contribution

The paper proposes GAZ PTP, a new self-competition method that integrates past policies into planning, outperforming existing GAZ variants in optimization problems.

Findings

01

GAZ PTP outperforms single-player GAZ variants with half the search budget.

02

Effective in combinatorial optimization problems like TSP and Job-Shop Scheduling.

03

Demonstrates the benefit of using historical policies in planning algorithms.

Abstract

AlphaZero-type algorithms may stop improving on single-player tasks in case the value network guiding the tree search is unable to approximate the outcome of an episode sufficiently well. One technique to address this problem is transforming the single-player task through self-competition. The main idea is to compute a scalar baseline from the agent's historical performances and to reshape an episode's reward into a binary output, indicating whether the baseline has been exceeded or not. However, this baseline only carries limited information for the agent about strategies how to improve. We leverage the idea of self-competition and directly incorporate a historical policy into the planning process instead of its scalar performance. Based on the recently introduced Gumbel AlphaZero (GAZ), we propose our algorithm GAZ 'Play-to-Plan' (GAZ PTP), in which the agent learns to find strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

grimmlab/policy-based-self-competition
pytorchOfficial

Videos

Policy-Based Self-Competition for Planning Problems· slideslive

Taxonomy

TopicsAuction Theory and Applications · Game Theory and Applications · Constraint Satisfaction and Optimization

MethodsAlphaZero