TSS GAZ PTP: Towards Improving Gumbel AlphaZero with Two-stage Self-play for Multi-constrained Electric Vehicle Routing Problems
Hui Wang, Xufeng Zhang, Xiaoyu Zhang, Zhenhuan Ding and, Chaoxu Mu

TL;DR
This paper introduces TSS GAZ PTP, a two-stage self-play strategy to enhance Gumbel AlphaZero for solving complex combinatorial optimization problems like TSP and EVRP, achieving superior results over existing methods.
Contribution
The paper proposes a novel two-stage self-play approach that improves Gumbel AlphaZero's performance on complex CO problems, including multi-constrained EVRP.
Findings
TSS GAZ PTP outperforms original GAZ on TSP.
The method surpasses state-of-the-art DRL approaches on EVRP.
It exceeds traditional optimization solvers on large-scale instances.
Abstract
Recently, Gumbel AlphaZero~(GAZ) was proposed to solve classic combinatorial optimization problems such as TSP and JSSP by creating a carefully designed competition model~(consisting of a learning player and a competitor player), which leverages the idea of self-play. However, if the competitor is too strong or too weak, the effectiveness of self-play training can be reduced, particularly in complex CO problems. To address this problem, we further propose a two-stage self-play strategy to improve the GAZ method~(named TSS GAZ PTP). In the first stage, the learning player uses the enhanced policy network based on the Gumbel Monte Carlo Tree Search~(MCTS), and the competitor uses the historical best trained policy network~(acts as a greedy player). In the second stage, we employ Gumbel MCTS for both players, which makes the competition fiercer so that both players can continuously learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVehicle Routing Optimization Methods · Transportation and Mobility Innovations · Advanced Manufacturing and Logistics Optimization
MethodsElectric
