Winner Takes It All: Training Performant RL Populations for   Combinatorial Optimization

Nathan Grinsztajn; Daniel Furelos-Blanco; Shikha Surana; Cl\'ement; Bonnet; Thomas D. Barrett

arXiv:2210.03475·cs.AI·November 15, 2023·6 cites

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Nathan Grinsztajn, Daniel Furelos-Blanco, Shikha Surana, Cl\'ement, Bonnet, Thomas D. Barrett

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Poppy, a training method for populations of reinforcement learning policies that specialize complementarily, achieving state-of-the-art results on multiple NP-hard combinatorial optimization problems.

Contribution

Poppy is a simple, unsupervised training procedure that induces diverse, complementary policies without predefined diversity notions, improving RL performance on complex problems.

Findings

01

Poppy produces complementary policy sets.

02

Achieves state-of-the-art RL results on four NP-hard problems.

03

Outperforms existing methods in combinatorial optimization.

Abstract

Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic sampling and beam search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary policies, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the population. We show that Poppy produces a set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

instadeepai/poppy
jax

Videos

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization· slideslive

Taxonomy

TopicsVehicle Routing Optimization Methods · Metaheuristic Optimization Algorithms Research · Auction Theory and Applications