TL;DR
This paper introduces a novel sequence decoding method for neural combinatorial optimization that enhances solution diversity and quality through sampling without replacement, improving performance on multiple problems.
Contribution
It proposes a simple, problem-independent sequence decoding approach for self-improved learning that outperforms previous methods across several combinatorial optimization problems.
Findings
Strong performance on TSP and CVRP
Outperforms previous NCO methods on Job Shop Scheduling
Increases solution diversity through sampling without replacement
Abstract
The constructive approach within Neural Combinatorial Optimization (NCO) treats a combinatorial optimization problem as a finite Markov decision process, where solutions are built incrementally through a sequence of decisions guided by a neural policy network. To train the policy, recent research is shifting toward a 'self-improved' learning methodology that addresses the limitations of reinforcement learning and supervised approaches. Here, the policy is iteratively trained in a supervised manner, with solutions derived from the current policy serving as pseudo-labels. The way these solutions are obtained from the policy determines the quality of the pseudo-labels. In this paper, we present a simple and problem-independent sequence decoding method for self-improved learning based on sampling sequences without replacement. We incrementally follow the best solution found and repeat the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
