Learning Collaborative Policies to Solve NP-hard Routing Problems

Minsu Kim; Jinkyoo Park; Joungho Kim

arXiv:2110.13987·cs.LG·October 28, 2021·43 cites

Learning Collaborative Policies to Solve NP-hard Routing Problems

Minsu Kim, Jinkyoo Park, Joungho Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a hierarchical learning framework called learning collaborative policies (LCP) that combines exploration and exploitation strategies using two DRL policies to effectively solve NP-hard routing problems like TSP, PCTSP, and CVRP.

Contribution

The paper proposes a novel two-policy collaborative approach, integrating a diversified seeder and an exploitative reviser, to improve DRL performance on complex routing problems.

Findings

01

LCP outperforms single-policy DRL on TSP, PCTSP, and CVRP.

02

The seeder's entropy regularization encourages diverse solution generation.

03

The reviser effectively refines candidate solutions to reduce total travel distance.

Abstract

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alstn12088/lcp
pytorchOfficial

Videos

Learning Collaborative Policies to Solve NP-hard Routing Problems· slideslive

Taxonomy

TopicsVehicle Routing Optimization Methods · Transportation and Mobility Innovations · Metaheuristic Optimization Algorithms Research

MethodsEntropy Regularization