Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization
Alexandre Laterre, Yunguan Fu, Mohamed Khalil Jabri and, Alain-Sam Cohen, David Kas, Karl Hajjar, Torbjorn S. Dahl, Amine, Kerkeni, Karim Beguir

TL;DR
This paper introduces the Ranked Reward (R2) algorithm, enabling self-play reinforcement learning for single-player combinatorial optimization problems by ranking rewards over multiple games, leading to improved performance over traditional methods.
Contribution
The paper proposes the R2 algorithm that adapts self-play reinforcement learning to single-player problems through reward ranking, a novel approach in this domain.
Findings
R2 outperforms Monte Carlo tree search, heuristics, and integer programming in bin packing.
Reward ranking improves learning efficiency and solution quality.
Analysis shows effectiveness varies with problem difficulty and ranking thresholds.
Abstract
Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research
MethodsAlphaZero
