Leader Reward for POMO-Based Neural Combinatorial Optimization
Chaoyang Wang, Pengzhi Cheng, Jingze Li, Weiwei Sun

TL;DR
This paper introduces Leader Reward, a novel training enhancement for POMO-based neural models, significantly improving their ability to produce optimal solutions for various combinatorial optimization problems with minimal extra computation.
Contribution
The paper proposes Leader Reward, a new training method for POMO models, that substantially enhances their solution quality across multiple combinatorial problems.
Findings
Reduces POMO's optimality gap by over 100 times on TSP100.
Applicable to various CO problems like TSP, CVRP, FFSP.
Works with different POMO-based models and inference strategies.
Abstract
Deep neural networks based on reinforcement learning (RL) for solving combinatorial optimization (CO) problems are developing rapidly and have shown a tendency to approach or even outperform traditional solvers. However, existing methods overlook an important distinction: CO problems differ from other traditional problems in that they focus solely on the optimal solution provided by the model within a specific length of time, rather than considering the overall quality of all solutions generated by the model. In this paper, we propose Leader Reward and apply it during two different training phases of the Policy Optimization with Multiple Optima (POMO) model to enhance the model's ability to generate optimal solutions. This approach is applicable to a variety of CO problems, such as the Traveling Salesman Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP), and the Flexible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Industrial Technology and Control Systems · Metaheuristic Optimization Algorithms Research
MethodsFocus
