POMO: Policy Optimization with Multiple Optima for Reinforcement   Learning

Yeong-Dae Kwon; Jinho Choo; Byoungjip Kim; Iljoo Yoon; Youngjune Gwon,; Seungjai Min

arXiv:2010.16011·cs.LG·July 14, 2021·135 cites

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim, Iljoo Yoon, Youngjune Gwon,, Seungjai Min

PDF

Open Access 3 Repos 1 Video

TL;DR

POMO is a reinforcement learning method that efficiently finds near-optimal solutions for various NP-hard combinatorial optimization problems by exploiting symmetries and encouraging diverse solutions.

Contribution

The paper introduces POMO, a novel RL-based heuristic that improves training stability, solution diversity, and performance across multiple combinatorial optimization problems.

Findings

01

Achieves 0.14% optimality gap on TSP100

02

Outperforms recent learned heuristics on TSP, CVRP, and KP

03

Reduces inference time significantly

Abstract

In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a great potential in practical applications because it allows near-optimal solutions to be found without expert guides armed with substantial domain knowledge. We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a heuristic solver. POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution. POMO uses a modified REINFORCE algorithm that forces diverse rollouts towards all optimal solutions. Empirically, the low-variance baseline of POMO makes RL training fast and stable, and it is more resistant to local minima compared to previous approaches. We also introduce a new augmentation-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Transportation and Mobility Innovations

MethodsPOMO · REINFORCE