Ranked Reward: Enabling Self-Play Reinforcement Learning for   Combinatorial Optimization

Alexandre Laterre; Yunguan Fu; Mohamed Khalil Jabri and; Alain-Sam Cohen; David Kas; Karl Hajjar; Torbjorn S. Dahl; Amine; Kerkeni; Karim Beguir

arXiv:1807.01672·cs.LG·December 10, 2018·39 cites

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Alexandre Laterre, Yunguan Fu, Mohamed Khalil Jabri and, Alain-Sam Cohen, David Kas, Karl Hajjar, Torbjorn S. Dahl, Amine, Kerkeni, Karim Beguir

PDF

Open Access 2 Repos

TL;DR

This paper introduces the Ranked Reward (R2) algorithm, enabling self-play reinforcement learning for single-player combinatorial optimization problems by ranking rewards over multiple games, leading to improved performance over traditional methods.

Contribution

The paper proposes the R2 algorithm that adapts self-play reinforcement learning to single-player problems through reward ranking, a novel approach in this domain.

Findings

01

R2 outperforms Monte Carlo tree search, heuristics, and integer programming in bin packing.

02

Reward ranking improves learning efficiency and solution quality.

03

Analysis shows effectiveness varies with problem difficulty and ranking thresholds.

Abstract

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research

MethodsAlphaZero