The Transformer Network for the Traveling Salesman Problem
Xavier Bresson, Thomas Laurent

TL;DR
This paper adapts the Transformer neural network architecture for solving the Traveling Salesman Problem using reinforcement learning, achieving near-optimal solutions that outperform recent learned heuristics.
Contribution
It introduces a novel application of Transformer models to TSP, demonstrating improved heuristic performance through reinforcement learning and beam search decoding.
Findings
Achieved an optimal gap of 0.004% for TSP50.
Achieved an optimal gap of 0.39% for TSP100.
Outperformed recent learned heuristics.
Abstract
The Traveling Salesman Problem (TSP) is the most popular and most studied combinatorial problem, starting with von Neumann in 1951. It has driven the discovery of several optimization techniques such as cutting planes, branch-and-bound, local search, Lagrangian relaxation, and simulated annealing. The last five years have seen the emergence of promising techniques where (graph) neural networks have been capable to learn new combinatorial algorithms. The main question is whether deep learning can learn better heuristics from data, i.e. replacing human-engineered heuristics? This is appealing because developing algorithms to tackle efficiently NP-hard problems may require years of research, and many industry problems are combinatorial by nature. In this work, we propose to adapt the recent successful Transformer architecture originally developed for natural language processing to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Metaheuristic Optimization Algorithms Research · Vehicle Routing Optimization Methods
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dropout · Adam · Label Smoothing · Multi-Head Attention
