Continuous Relaxations for the Traveling Salesman Problem

Tuhin Sahai; Adrian Ziessler; Stefan Klus; Michael Dellnitz

arXiv:1702.05224·cs.DM·August 14, 2019

Continuous Relaxations for the Traveling Salesman Problem

Tuhin Sahai, Adrian Ziessler, Stefan Klus, Michael Dellnitz

PDF

TL;DR

This paper introduces a novel approach to the TSP by embedding it into manifolds using dynamical systems, providing a biasing method for heuristics that often reduces the number of local search steps needed.

Contribution

It develops a new relaxation technique for the TSP on the manifold of orthogonal matrices and integrates it with the Lin--Kernighan heuristic to improve solution biasing.

Findings

01

Procrustes-based solutions often converge to undesirable equilibria.

02

The approach biases the Lin--Kernighan heuristic towards better solutions.

03

Fewer k-opt moves are needed compared to traditional methods.

Abstract

In this work, we aim to explore connections between dynamical systems techniques and combinatorial optimization problems. In particular, we construct heuristic approaches for the traveling salesman problem (TSP) based on embedding the relaxed discrete optimization problem into appropriate manifolds. We explore multiple embedding techniques -- namely, the construction of new dynamical systems on the manifold of orthogonal matrices and associated Procrustes approximations of the TSP cost function. Using these dynamical systems, we analyze the local neighborhood around the optimal TSP solutions (which are equilibria) using computations to approximate the associated \emph{stable manifolds}. We find that these flows frequently converge to undesirable equilibria. However, the solutions of the dynamical systems and the associated Procrustes approximation provide an interesting biasing approach…

Figures28

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Comparison of α 𝛼 \alpha -nearness and P 𝑃 P -nearness based Lin–Kernighan heuristic on TSPLIB instances. The size of the candidate sets is set to 5 5 5 per city and we stop the computations after 8 n 8 𝑛 8n k 𝑘 k -opt moves in LKH, where n 𝑛 n is the number of cities. Out of the 22 22 22 TSPLIB instances, P 𝑃 P -nearness computes a better solution in 18 18 18 of the cases.

TSP	$α$ -nearness	$P$ -nearness	improvement
d198	16540	16465	0.45 %
pcb442	50785	50832	-0.09 %
d493	36028	35023	2.79 %
u574	36984	36926	0.16 %
rat575	6796	6790	0.09 %
p654	35716	37039	-3.70 %
d657	49504	49158	0.70 %
u724	42295	41904	0.92 %
rat783	9054	8810	2.69 %
pr1002	261797	259810	0.76 %
u1060	224510	224552	-0.20 %
vm1084	244411	242573	0.75 %
pcb1173	56934	56915	0.03 %
d1291	53357	51610	3.27 %
rl1323	279810	275904	1.40 %
nrw1379	141510	67035	52.63 %
fl1400	21319	22775	-6.83 %
u1432	153213	153054	0.10 %
fl1577	28217	24357	13.68 %
d1655	95532	64837	32.13 %
u1817	58351	58213	0.24 %
rl1889	345475	340271	1.51 %

Equations85

c = i = 1 \sum n - 1 d_{σ (i), σ (i + 1)} + d_{σ (n), σ (1)},

c = i = 1 \sum n - 1 d_{σ (i), σ (i + 1)} + d_{σ (n), σ (1)},

P \in P_{n} min tr (A^{T} P^{T} B P),

P \in P_{n} min tr (A^{T} P^{T} B P),

T_{dir} = 0110 1 ⋱ ⋱ 0 10 or T_{undir} = 011 10 ⋱ 1 ⋱ 1 ⋱ 01 110 .

T_{dir} = 0110 1 ⋱ ⋱ 0 10 or T_{undir} = 011 10 ⋱ 1 ⋱ 1 ⋱ 01 110 .

P \in P_{n} min tr (A^{T} P^{T} B P + P^{T} C),

P \in P_{n} min tr (A^{T} P^{T} B P + P^{T} C),

A - P^{T} B P_{F}^{2} = tr (A^{T} A) - 2 tr (A^{T} P^{T} B P) + tr (B^{T} B) = ∥ A ∥_{F}^{2} - 2 tr (A^{T} P^{T} B P) + ∥ B ∥_{F}^{2} .

A - P^{T} B P_{F}^{2} = tr (A^{T} A) - 2 tr (A^{T} P^{T} B P) + tr (B^{T} B) = ∥ A ∥_{F}^{2} - 2 tr (A^{T} P^{T} B P) + ∥ B ∥_{F}^{2} .

d_{ij} (π) = d_{ij} + π_{i} + π_{j} .

d_{ij} (π) = d_{ij} + π_{i} + π_{j} .

P \in O_{n} min tr (A^{T} P^{T} B P) .

P \in O_{n} min tr (A^{T} P^{T} B P) .

P \in O_{n} min A - P^{T} B P_{F}

P \in O_{n} min A - P^{T} B P_{F}

P^{*} = V_{B} S V_{A}^{T},

P^{*} = V_{B} S V_{A}^{T},

\dot{P} = - \nabla F (P),

\dot{P} = - \nabla F (P),

\nabla F (P) = F_{P} - P F_{P}^{T} P = P {P, F_{P}},

\nabla F (P) = F_{P} - P F_{P}^{T} P = P {P, F_{P}},

\nabla F (P) = P ({P^{T} B P, A} + {P^{T} B^{T} P, A^{T}}) .

\nabla F (P) = P ({P^{T} B P, A} + {P^{T} B^{T} P, A^{T}}) .

\nabla F (P) = 2 P [P^{T} B P, A] .

\nabla F (P) = 2 P [P^{T} B P, A] .

\nabla G (P) = P ((P \circ P)^{T} P - P^{T} (P \circ P)) .

\nabla G (P) = P ((P \circ P)^{T} P - P^{T} (P \circ P)) .

\dot{P} = - (1 - k) P ({P^{T} B P, A} + {P^{T} B^{T} P, A^{T}}) - k P ((P \circ P)^{T} P - P^{T} (P \circ P)),

\dot{P} = - (1 - k) P ({P^{T} B P, A} + {P^{T} B^{T} P, A^{T}}) - k P ((P \circ P)^{T} P - P^{T} (P \circ P)),

P \in O_{n} min

P \in O_{n} min

s . t .

\dot{P}

\dot{P}

\dot{λ}

\dot{P} = - P ({P^{T} B P, A} + {P^{T} B^{T} P, A^{T}}) .

\dot{P} = - P ({P^{T} B P, A} + {P^{T} B^{T} P, A^{T}}) .

B \in L ⋃ B = Q \mbox an d \mbox in t B \cap \mbox in t B^{'} = \emptyset, \mbox f or a l l B, B^{'} \in L, B \neq = B^{'} .

B \in L ⋃ B = Q \mbox an d \mbox in t B \cap \mbox in t B^{'} = \emptyset, \mbox f or a l l B, B^{'} \in L, B \neq = B^{'} .

Q (c, r) = {y \in R^{n^{2}} : ∣ y_{i} - c_{i} ∣ \leq r_{i} \mbox f or i = 1, \dots, n^{2}},

Q (c, r) = {y \in R^{n^{2}} : ∣ y_{i} - c_{i} ∣ \leq r_{i} \mbox f or i = 1, \dots, n^{2}},

C_{j + 1} = {B \in L_{s} : \exists B^{'} \in C_{j} \mbox s u c h t ha t B \cap (vec \circ Φ (vec^{- 1} (B^{'}))) \neq = \emptyset} .

C_{j + 1} = {B \in L_{s} : \exists B^{'} \in C_{j} \mbox s u c h t ha t B \cap (vec \circ Φ (vec^{- 1} (B^{'}))) \neq = \emptyset} .

C_{j + 1} = {B \in L_{s} : \exists B^{'} \in C_{j} \mbox s u c h t ha t B \cap (vec \circ Φ (vec^{- 1} (B^{'}))) \neq = \emptyset} .

C_{j + 1} = {B \in L_{s} : \exists B^{'} \in C_{j} \mbox s u c h t ha t B \cap (vec \circ Φ (vec^{- 1} (B^{'}))) \neq = \emptyset} .

P_{1} = 1000000100000100100000001 \mbox an d P_{2} = 0000100100010000001010000 .

P_{1} = 1000000100000100100000001 \mbox an d P_{2} = 0000100100010000001010000 .

\overset{ˉ}{P} = 1000000010010000010000001 .

\overset{ˉ}{P} = 1000000010010000010000001 .

\overset{ˉ}{P}_{1} = 0001000001010000010010000 \mbox an d \overset{ˉ}{P}_{2} = 1000000001000100010001000 .

\overset{ˉ}{P}_{1} = 0001000001010000010010000 \mbox an d \overset{ˉ}{P}_{2} = 1000000001000100010001000 .

H \in T_{n} min tr (A^{T} H),

H \in T_{n} min tr (A^{T} H),

\dot{H} = - [H, {H, F_{H}} + {H^{T}, F_{H}^{T}}] .

\dot{H} = - [H, {H, F_{H}} + {H^{T}, F_{H}^{T}}] .

\dot{H} = \dot{P}^{T} B P + P^{T} B \dot{P} = - [H, {P, F_{P}}] .

\dot{H} = \dot{P}^{T} B P + P^{T} B \dot{P} = - [H, {P, F_{P}}] .

\frac{\partial F}{\partial P _{ij}}

\frac{\partial F}{\partial P _{ij}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Continuous Relaxations for the Traveling Salesman Problem

Tuhin Sahai

United Technologies Research Center, 2855 Telegraph Ave, Suite 410,

Berkeley, CA, 94705, USA.

Adrian Ziessler

Department of Mathematics, Paderborn University,

Warburger Straße 100, 33098 Paderborn, Germany.

Stefan Klus

Department of Mathematics and Computer Science, Freie Universität Berlin,

Arnimallee 9, 14195 Berlin, Germany.

Michael Dellnitz

Department of Mathematics, Paderborn University,

Warburger Straße 100, 33098 Paderborn, Germany.

Abstract

In this work, we aim to explore connections between dynamical systems techniques and combinatorial optimization problems. In particular, we construct heuristic approaches for the traveling salesman problem (TSP) based on embedding the relaxed discrete optimization problem into appropriate manifolds. We explore multiple embedding techniques – namely, the construction of new dynamical systems on the manifold of orthogonal matrices and associated Procrustes approximations of the TSP cost function. Using these dynamical systems, we analyze the local neighborhood around the optimal TSP solutions (which are equilibria) using computations to approximate the associated stable manifolds. We find that these flows frequently converge to undesirable equilibria. However, the solutions of the dynamical systems and the associated Procrustes approximation provide an interesting biasing approach for the popular Lin–Kernighan heuristic which yields fast convergence. The Lin–Kernighan heuristic is typically based on the computation of edges that have a “high probability” of being in the shortest tour, thereby effectively pruning the search space. Our new approach, instead, relies on a natural relaxation of the combinatorial optimization problem to the manifold of orthogonal matrices and the subsequent use of this solution to bias the Lin–Kernighan heuristic. Although the initial cost of computing these edges using the Procrustes solution is higher than existing methods, we find that the Procrustes solution, when coupled with a homotopy computation, contains valuable information regarding the optimal edges. We explore the Procrustes based approach on several TSP instances and find that our approach often requires fewer $k$ -opt moves than existing approaches. Broadly, we hope that this work initiates more work in the intersection of dynamical systems theory and combinatorial optimization.

1 Introduction

The use of dynamical systems based methods for analyzing optimization algorithms is a burgeoning area of interest. For example, it has been found that if one embeds sufficiently hard instances of the satisfiability problem [BHvM09] into a corresponding dynamical system, one observes transient chaos [ERT11]. In the continuous optimization setting, accelerated momentum methods were analyzed using dynamical systems and calculus of variation based approaches, providing intuitive insight into convergence properties [SBC14, WWJ16]. However, concrete examples of the direct application of dynamical systems and continuous processes to construct new state-of-the-art algorithms are limited. In this work, we aim to use dynamical systems and their associated manifolds of the traveling salesman problem (TSP) to extract computational and algorithmic insights.

The TSP is an iconic NP-hard problem that has received decades of interest [Coo11]. This combinatorial optimization problem arises in a wide variety of applications related to genome map construction [AAM*+*00], telescope management [Car97, KK07], and drilling circuit boards [GJR91]. The TSP also naturally arises in applications related to target tracking [ESC13], vehicle routing [LK81], and communication networks [GLO03] to name a few. Recently, a history dependent TSP was used to construct efficient techniques for learning the structure of Bayesian networks [SK15]. For further information about applications related to the TSP, we refer the reader to [Coo11].

In its basic form, the statement of the TSP is exceedingly simple. The task is to find the shortest Hamiltonian circuit through a list of cities, given their pairwise distances. Despite its simplistic appearance, the underlying problem is NP-hard [Kar10]. Several heuristics have been developed over the years to solve the problem including ant colony optimization [DG97], cutting plane methods [ABCC98, PR91], Christofides heuristic algorithm [Chr76], and the Lin–Kernighan heuristic [LK73].

In this work, we concentrate on exploring novel orthogonal relaxation and embedding based approximations to the TSP that are inspired from dynamical systems theory. In the first part, we construct a dynamical systems approach for computing locally optimal solutions of the TSP. This flow on the manifold of orthogonal matrices converges to a permutation matrix that minimizes the tour length. Although the method is interesting and elegant, the flow often converges to local minima. For TSP instances with more than $50$ cities, these minima are not competitive when compared to state-of-the-art heuristics [ABCC98, Coo11, LK73, PR91].

However, inspired by this continuous relaxation, we compute the solution to a two-sided orthogonal Procrustes problem [GD04] that relaxes the TSP to the manifold of orthogonal matrices. We find that this Procrustes approach can be combined with the Lin–Kernighan heuristic [LK73] for computing solutions of the TSP. The Lin–Kernighan heuristic is an extremely popular method for the TSP and has been credited with finding the best known solutions for several large instances [Hel98, Hel06]. It has been particularly successful in finding the best known solutions for several asymmetric TSPs [Hel98]. We provide a detailed description of the Lin–Kernighan heuristic in Section 2.1.

Helsgaun’s software package LKH [Hel98] is a highly successful software implementation of the approach. This implementation uses minimum spanning trees [HK70, HK71] to pre-compute candidate sets that contain edges that are likely to be a part of the optimal solution. This biasing methodology is found to reduce the number of $k$ -opt moves compared to baseline minimum tree based methods [Hel98]. In our work, the Procrustes solution is used to bias the Lin–Kernighan heuristic algorithm to pick edges are that more likely to be in the optimal tour. We remark that our approach is tightly connected to spectral methods for graphs [KS18]. Although, the Procrustes based methodology has a higher overall computational cost $O(n^{3})$ due to the required eigenvector computations – compared to $O(n^{2.2})$ in the case of the traditional Lin–Kernighan heuristic [LK73]. However, the Procrustes based Lin–Kernighan computation frequently converges faster (in fewer iterations) than the $1$ -tree based approach.

Our goals are twofold: First, we would like to demonstrate that the spectral structure of the associated matrices of these classes of problems contain valuable information that can be exploited for analysis and construction of novel heuristics. Second, we envision that by approximating the spectral structure [SSO17], one could potentially construct competitive methods for the TSP and the quadratic assignment problem (QAP). Moreover, we note that although eigenvalue and orthogonal approximations have been constructed for the TSP, they have traditionally been used for deriving bounds for the solutions [FBR87, HRW90]. To the best of the authors knowledge, this work is the first attempt to use the orthogonal relaxations and dynamical systems theory to construct computational methods for the TSP.

Our paper is organized as follows: We start with the mathematical formulation of the TSP in section 2. In section 2.1-2.3, we describe the standard Lin–Kernighan heuristic along with techniques to limit the search space using $\alpha$ -nearness values (based on minimum spanning trees). In section 3, we construct dynamical systems on the manifold of orthogonal matrices that converge to Hamiltonian cycles. Using these dynamical systems, we analyze the stability and subsets of the stable manifold of the optimal TSP solutions using set-oriented numerical methods implemented in the software package GAIO [DFJ01]. We perform the computations in an effort to gain insights into the local dynamics of the flow in the neighborhood of the optimal solutions. In general, we find that the basin of attraction is typically quite small and therefore, these dynamical systems converge to undesirable local minima. Inspired by these insights, we use a Procrustes-based approach for biasing the Lin–Kernighan heuristic based on “ $P$ -nearness values” in section 4. Numerical results are presented in section 5. Finally, we conclude with future work in section 6.

2 The traveling salesman problem

Given a list of $n$ cities $\{C_{1},C_{2},\dots,C_{n}\}$ and the associated distances between cities $C_{i}$ and $C_{j}$ , denoted by $d_{ij}$ , the TSP aims to find an ordering $\sigma$ of $\{1,2,\dots,n\}$ such that the tour cost, given by

[TABLE]

is minimized. For the Euclidean TSP, for instance, $d_{ij}=\left\lVert x_{i}-x_{j}\right\rVert_{2}$ , where $x_{i}\in\mathbb{R}^{d}$ is the position of $C_{i}$ . In general, however, the distance matrix $D=(d_{ij})$ does not have to be symmetric. The ordering $\sigma$ can be represented as a unique permutation matrix $P$ . Note, however, that due to the underlying cyclic symmetry, multiple orderings – corresponding to different permutation matrices – have the same cost.

There are several equivalent ways to define the cost function of the TSP. We restrict ourselves to the trace111The trace of a matrix $A\in\mathbb{R}^{n\times n}$ is defined to be the sum of all diagonal entries, i.e., $\operatorname{tr}(A)=\sum_{i=1}^{n}a_{ii}$ . formulation proposed in [Won95]. Let $\mathcal{P}_{n}$ denote the set of all $n\times n$ permutation matrices, then the TSP can be written as a combinatorial optimization problem of the form

[TABLE]

where $A=D$ and $B=T$ . Here, $T$ is defined to be the adjacency matrix of the cycle graph of length $n$ . In what follows, we use the undirected cycle graph adjacency matrix for symmetric TSPs and the one corresponding to the directed cycle graphs for asymmetric TSPs. The matrices are defined as,

[TABLE]

The equivalence of (1) and (2) can be derived easily using the observation that $\tilde{T}\coloneqq P^{T}TP$ is a permuted tour matrix, i.e., the $(i,j)$ entry is $1$ if the tour goes from city $C_{i}$ to city $C_{j}$ . Thus, for any permutation, $\operatorname{tr}(D^{T}\tilde{T})=\sum_{i,j=1}^{n}d_{ij}\tilde{t}_{ij}$ is simply the sum of the distances associated with each edge. In what follows, we restrict our work to symmetric matrices; thus, we simply consider tour matrix $T=T_{\text{undir}}$ for undirected graphs.

The TSP can also be regarded as a special case of the general QAP [BcPP98, KB57], given by

[TABLE]

or, alternatively as, a special case of the graph matching problem. The relationship between various combinatorial optimization problems is explored in Figure 1. In order to convert the minimization problem into an equivalent maximization problem, note that

[TABLE]

Thus, the norm is minimized if the trace is maximized and vice versa. We note that similar trace formulations for the QAP were derived in [FBR87].

Over the last few decades, a plethora of heuristics has been developed to solve the TSP efficiently. In order to find a good approximation of the optimal tour, typically different global and local heuristics are combined. A very efficient and powerful methodology is to construct an initial solution with the aid of greedy algorithms, for instance the nearest neighbor heuristic, and to improve the solution successively using local heuristics such as $k$ -opt move based methods (as described in section 2.1). As noted previously, one of the best available TSP solvers is Helsgaun’s LKH software [Hel98]. We now provide more details on LKH and its implementation.

2.1 The Lin–Kernighan heuristic

The Lin–Kernighan heuristic is a popular heuristic for the TSP introduced in [LK73]. Starting from an initial tour, the approach progresses by extracting edges from the tour and replacing them with new edges, while maintaining the Hamiltonian cycle constraint. If $k$ edges in the tour are simultaneously replaced, this is known as the $k$ -opt move [Hel09]. To prune the search space, the algorithm relies on minimum spanning trees [HK70, HK71] to identify edges that are more likely to be in the tour. This “importance” metric for edges is called $\alpha$ -nearness and described subsequently. The algorithm has found great success on large instances of the TSP [Hel98, Hel06]. Note that this algorithm has been extended to generalized TSPs [Hel14b] and clustered TSPs [Hel14a].

The LKH package [Hel98, Hel06] offers different heuristics to compute an initial tour. The standard method is to choose one node at random and to iteratively add edges based on computed $\alpha$ -nearness values and related candidate sets until a tour is found. When an initial tour has been found, LKH improves it using local heuristics. A very popular and efficient local heuristic is the $k$ -opt move. The simplest version, $2$ -opt, removes two edges of the tour and reconnects the subtours as shown in Figure 2. If the resulting tour is shorter than the original tour, the step is accepted and rejected otherwise. Similarly, $3$ -opt removes three edges of the current tour, reconnects the subtours and picks the shortest tour.

2.2 Candidate sets and $\alpha$ –nearness

LKH uses $k$ -opt with varying $k$ . The basic move is a sequential $5$ -opt step. In order to limit the search space and to increase the efficiency of $k$ -opt moves, candidate sets that contain promising edges are computed for all cities. Methods to construct candidate sets for large TSPs have to be efficient in terms of both CPU time and memory usage. As mentioned previously, the standard approach implemented in LKH, called $\alpha$ -nearness, is based on minimum spanning trees or, to be more precise, on 1-trees (a slight variant of minimum spanning trees).

Definition 2.1.

Let $G=(V,E)$ be a graph with vertices $V$ and edges $E$ . A 1-tree for $G$ is defined to be a spanning tree for the vertices $V\setminus\{v_{1}\}$ plus two additional edges $e\in E$ incident to vertex $v_{1}$ .

A 1-tree with minimum weight is called a minimum 1-tree. Note that every tour is a 1-tree with the additional property that the degree of each vertex is two. It has been found that the minimum 1-tree typically contains several edges that lie in the optimal tour. The definition of $\alpha$ -nearness is based on a sensitivity analysis using 1-trees. The $\alpha$ -nearness value for the cities $C_{i}$ and $C_{j}$ is, roughly speaking, the difference between a minimum 1-tree and a 1-tree that is required to contain edge $(v_{i},v_{j})$ . That is, if an edge belongs to the minimum 1-tree, then the $\alpha$ -nearness value is [math] and the edge is assigned a high probability of being part of the shortest tour.

For each city, the candidate set is then defined to be the set of the $m$ incident edges with the lowest $\alpha$ -nearness values. The candidate sets are used to limit and direct the search. Candidate sets based solely on the distance between cities are typically not connected (for instance, see Figure 11) and convergence to a good solution of the TSP is expected to be slow. It was shown by Stewart [Ste87] that minimum spanning trees, which are by definition always connected, can be used to increase the efficiency of local heuristics.

2.3 Subgradient optimization

The $\alpha$ -nearness values, typically do not give rise to optimal tours when coupled with k-opt moves. In order to improve the $\alpha$ -nearness values, a subgradient optimization method is often used. This method modifies the original distance matrix $D$ in a way such that the degree of almost all vertices of the optimized 1-tree converge to a value of $2$ . The entries of the new distance matrix, $\widetilde{D}(\pi)$ , are computed as

[TABLE]

The $\pi$ values, sometimes called penalties, change the distances between the cities. The basic idea is to make edges incident to vertices with a low degree shorter and edges incident to vertices with a high degree longer so that the resulting 1-tree is close to a tour. This transformation of the distance matrix does not change the shortest tour and leads to significantly improved $\alpha$ -nearness values [Hel98]. This method can also be used to compute a lower bound which is in general very close to the optimal tour length [HK70, HK71]. Figure 3 shows the impact of the subgradient optimization. In the example, most of the edges of the optimal tour are already present in the optimized 1-tree. For a more detailed description of $\alpha$ -nearness, 1-trees, and the subgradient optimization scheme, we refer to [HK70, HK71, Hel98].

3 Dynamical systems approach

In this section, we construct a dynamical systems approach for computing optimal tours for the TSP. In particular, we use matrix differential equations defined on the manifold of orthogonal matrices. As mentioned in previous sections, solutions of the TSP can be represented as permutation matrices. It is well known that permutation matrices lie on the manifold of orthogonal matrices. Our goal is to construct flows that minimize the TSP cost as they evolve. Note that gradient flow methods were first used by Brockett to compute eigenvalues and to solve linear programming or least squares matching problems [Bro89, Bro91]. This approach was subsequently also used for combinatorial optimization problems [WY10, Won95, ZP08]. We will now formulate multiple cost functions for constructing gradient flows for the TSP. In what follows, let us denote by ${\mathcal{O}_{n}=\left\{P\in\mathbb{R}^{n\times n}\mid P^{T}P=I\right\}\supset\mathcal{P}_{n}}$ the set of all $n\times n$ orthogonal matrices. In this section, we consider the following orthogonal relaxation of the combinatorial optimization problem (2),

[TABLE]

Given that we will use the solution of the Procrustes problem in both the dynamical systems approach and Lin-Kernighan heuristic, we start by defining the problem and its solution.

3.1 The two-sided orthogonal Procrustes problem

We start by considering the standard formulation of the Procrustes problem. We will then modify it to the TSP setting. Let $A$ and $B$ be two symmetric $n\times n$ matrices. Then

[TABLE]

is called the two-sided orthogonal Procrustes problem. As shown in (4), cost function (5) is minimized if the cost function $\left\lVert A-P^{T}BP\right\rVert_{F}$ of the Procrustes problem is maximized and vice versa. Since $\mathcal{P}_{n}\subset\mathcal{O}_{n}$ , the cost of the orthogonal matrix is always lower than (or equal to if the matrices $A$ and $B$ are permutation-similar) the cost of the permutation matrix.

Theorem 3.1.

Given two symmetric matrices $A$ and $B$ , whose eigenvalues are distinct, let ${A=V_{A}\Lambda_{A}V_{A}^{T}}$ and $B=V_{B}\Lambda_{B}V_{B}^{T}$ be eigendecompositions, with $\Lambda_{A}=\operatorname{diag}\left(\lambda_{A}^{(1)},\dots,\lambda_{A}^{(n)}\right)$ , $\Lambda_{B}=\operatorname{diag}\left(\lambda_{B}^{(1)},\dots,\lambda_{B}^{(n)}\right)$ , and $\lambda_{A}^{(1)}\geq\dots\geq\lambda_{A}^{(n)}$ as well as $\lambda_{B}^{(1)}\geq\dots\geq\lambda_{B}^{(n)}$ . Then every orthogonal matrix $P^{*}$ which minimizes (6) has the form

[TABLE]

where $S=\operatorname{diag}(\pm 1,\dots,\pm 1)$ .

A proof of this theorem can be found in [Sch68], for example. If the eigenvalues of $A$ and $B$ are distinct, then there exist $2^{n}$ different solutions with the same cost. If one or both of the matrices possess repeated eigenvalues, then the eigenvectors in the matrices $V_{A}$ and $V_{B}$ are determined only up to basis rotations, which further increases the solution space.

We note that, as shown in (4), the minimization of the TSP cost corresponds to the maximization of the cost in (6) and not the standard minimization found in literature. The theorem states that in order to minimize the cost function in (6), the eigenvalues and corresponding eigenvectors have to be sorted both in either increasing or decreasing order. On the other hand, from the proof of the theorem it can be seen that in order to compute the solution of (5), the eigenvalues and eigenvectors of $A$ and $B$ , respectively, have to be sorted in opposite order (which corresponds to the maximization of the cost in (6)). A similar condition was noted in [AW00].

Let us now consider gradient flows for orthogonal and tour matrices. We note that the Procrustes problem is relevant for the resulting dynamical systems as demonstrated below.

3.2 Gradient flows for orthogonal matrices

The orthogonal relaxation of the combinatorial optimization problem (2), given by (5), can be solved using a steepest descent method on the manifold of orthogonal matrices. Given a cost function $F$ , the gradient flow is defined as

[TABLE]

which is a matrix differential equation evolving on the manifold of orthogonal matrices. That is, starting with an orthogonal matrix $P$ , the trajectory remains for all time in $\mathcal{O}_{n}$ . Let $[A,B]=AB-BA$ be the standard Lie bracket and $\{A,B\}=A^{T}B-B^{T}A$ the generalized Lie bracket. The gradient of a function $F$ defined on the manifold of orthogonal matrices is

[TABLE]

where $F_{P}$ is the matrix of partial derivatives [EAS98], i.e., $(F_{P})_{ij}=\frac{\partial F}{\partial P_{ij}}$ .

Lemma 3.2.

For the cost function $F(P)=\operatorname{tr}\left(A^{T}P^{T}BP\right)$ , we obtain

[TABLE]

Proof.

Since $F_{P}=BPA^{T}+B^{T}PA$ , see [PP08], using (8) it directly follows that $\nabla{F}(P)=P\{P,BPA^{T}+B^{T}PA\}$ , which can be rewritten as above. ∎

This is our generalization of the matrix flow defined in [ZP08] for the symmetric graph matching problem. If $A$ and $B$ are symmetric, this can be simplified to

[TABLE]

Since the optimal solution of this optimization problem is in general not a permutation matrix, Zavlanos and Pappas [ZP08] use a second term for solving the graph matching problem, which penalizes nonnegative entries. Note that the set of permutation matrices is the intersection of the sets of orthogonal and nonnegative matrices. In order to force the gradient flow to converge to a permutation matrix, a cubic penalty function is used.

Lemma 3.3.

Let $\circ$ denote the Hadamard or element-wise product of two matrices. For the penalty function $G(P)=\tfrac{1}{3}\operatorname{tr}\left(P^{T}\left(P-(P\circ P)\right)\right)=\tfrac{1}{3}n-\tfrac{1}{3}\sum_{i,j=1}^{n}p_{ij}^{3}$ , the gradient is given by

[TABLE]

Proof.

Note that $G_{P}=-(P\circ P)$ . Using (8) results in the above gradient. ∎

By combining the two functions $F$ and $G$ , it is possible to compute a permutation matrix which is “close” to the optimal orthogonal solution. In [ZP08], the steady state solution of the superimposed gradient flows for $F$ and $G$ , given by

[TABLE]

is computed for $k=0$ , then the parameter $k$ is set to a value sufficiently close to $1$ so that the flow converges to a permutation matrix. Another approach is to apply a homotopy-based scheme, where $k$ is the continuation parameter which is gradually increased until the solution is close to a permutation matrix.

In what follows, we will consider the TSP as a constrained optimization problem of the form,

[TABLE]

This formulation gives rise to the following set of equations,

[TABLE]

The above set of equations are obtained by using gradient descent on the Lagrangian cost function, as described in [PB88], for general constrained optimization problems. Note that $\lambda$ in (12) is the Lagrange multiplier.

Example 3.4.

In order to illustrate the gradient flow approach, let us consider a simple TSP with 10 cities. Using (12), we obtain the results shown in Figure 4. In this example, the dynamical system converges to the optimal tour.

Next, we will perform a detailed numerical study of the gradient flow (12) for a simple TSP with five cities. We first consider (12) without the equality constraints, i.e.,

[TABLE]

As mentioned previously, for the symmetric TSP, we set $A=D$ and $B=T_{\text{undir}}$ . Solving (13) forward in time yields a solution that minimizes the cost function $F(P)$ in $\mathcal{O}_{n}$ . This solution can also be computed by solving the two-sided orthogonal Procrustes problem (6). However, since the matrix $T_{\text{undir}}$ possesses repeated eigenvalues, the Procrustes problem has infinitely many solutions due to rotational degeneracy. An illustration of the Procrustes sets is shown in Figure 5. We note that the structure of a Procrustes set becomes more and more complex for larger $n\in\mathbb{N}$ since the number of repeated eigenvalues also increases with $n$ .

To shed light on the stability and local dynamics around the optimal TSP solutions we approximate subsets of the stable manifold of the Procrustes solutions such that two permutation matrices are inside these sets. Note that the two matrices are explicitly shown in (14). This numerical study will help us to understand if the Procrustes solutions are robust under small perturbations of the initial permutation matrix and whether or not the Procrustes solution is, in general, a viable approach to constructing useful initial conditions and determine how close the Procrustes solution is to the optimal permutation matrix. In order to compute the sets of interest, we will use a set-oriented continuation technique developed in [DH96], which recently has been extended to the computation of unstable manifolds for infinite dimensional dynamical systems [ZDG18].

Let $\bm{Q}\subset\mathbb{R}^{n^{2}}$ be a compact set within which we want to approximate the subset of the stable manifold of a permutation matrix $\bar{P}$ . We define a partition $\bm{L}$ of $\bm{Q}$ to be a finite family of compact subsets of $\bm{Q}$ such that

[TABLE]

Moreover, we denote by $\bm{L}(x)\in\bm{L}$ the element of $\bm{L}$ containing $x\in\bm{Q}$ . In our context, $x$ is a reordering of an (orthogonal) $n\times n$ matrix $\widetilde{P}$ into a vector, i.e., ${x=\operatorname{vec}(\widetilde{P})}$ (cf. [PP08]). We consider a nested sequence $\bm{L}_{s},\ s\in\mathbb{N}$ , of successively finer partitions of $\bm{Q}$ , requiring that for all $B\in\bm{L}_{s}$ there exist cells $B_{1},\ldots,B_{m}\in\bm{L}_{s+1}$ such that ${B=\bigcup_{i}B_{i}}$ and ${\mbox{volume}(B_{i})=\tfrac{1}{2}\,\mbox{volume}(B)}$ . A set $B\in\bm{L}_{s}$ is said to be of level $s$ . The partition $\bm{L}_{s+1}$ is computed by a subdivision procedure, where we subdivide each cell of the previous partition $\bm{L}_{s}$ with respect to the $i$ -th coordinate, where $i$ is varied cyclically (for more details see [DH97]). However, these subdivision steps are only done virtually and we do not store the cells of the partition $\bm{L}_{s+1}$ .

We assume that $C=\bm{L}_{s}(\operatorname{vec}(\bar{P}))$ is the cell which contains the initial permutation matrix $\bar{P}$ in vector form. Furthermore, let us denote by $\Phi$ the time- $\tau$ -map of (13). Then the numerical realization of the continuation algorithm for the approximation of the subsets of the stable manifold can be described as follows:

Remark 3.5.

a)

In the application of Algorithm 1 we have to perform the continuation step

[TABLE]

Numerically this is realized as follows: First, $\Phi$ is evaluated for a large number of test matrices $\mbox{vec}^{-1}(x),\ x\in B^{\prime}$ , for each cell $B^{\prime}\in C_{j}$ . Then a cell $B\in\mathcal{P}_{s}$ is added to the collection $C_{j+1}$ if there is at least one $x\in B^{\prime}$ such that ${\left(\mbox{vec}\circ\Phi\left(\mbox{vec}^{-1}(x)\right)\right)\in B}$ . In practice, the components $x\in B^{\prime}$ of the test matrices can be chosen according to several different strategies: For low-dimensional problems one can choose them from a regular grid within each cell $B$ . Alternatively, one can select them via a Monte-Carlo sampling. Observe however that, in general, $\operatorname{vec}^{-1}(x)\notin\mathcal{O}_{n}$ for $x\in B^{\prime}$ . Hence, in order to construct an orthogonal matrix we compute a polar decomposition of $\operatorname{vec}^{-1}(x)$ (cf. **[Hig86]**). The polar decomposition yields a small perturbation of the permutation matrix $\bar{P}$ . 2. b)

The number of continuation steps $j$ crucially depends on the choice of the integration time $\tau$ . In general, the smaller we choose $\tau$ the more continuation steps are done which leads to a higher computational cost. However, to prevent isolated cells in the covering, $\tau$ has to be chosen sufficiently small. 3. c)

The choice of the integration time $\tau$ can be relaxed by choosing a finite time grid $\{t_{0},\ldots,t_{N}\}$ , for $t_{N}=\tau$ , where we mark all cells in $\bm{L}_{s}$ which are hit in each time step. This allows us to increase the integration time without decreasing the quality of the covering obtained by Algorithm 1.

For the five-dimensional example, we choose $\bm{Q}=[-1,1]^{25}$ , $\tau=10000$ and a fine partition $\bm{L}_{s}$ of $\bm{Q}$ for $s=175$ . In particular, this means that $\bm{Q}$ is subdivided into $2^{175}$ cells of radius $r_{i}=\frac{1}{2^{7}}$ , for $i=1,\ldots,25$ . Moreover, we use the strategy described in Remark 3.5 c). In Figure 6, we illustrate two subsets of the stable manifold containing the permutation matrices

[TABLE]

Small perturbations of these matrices result in different Procrustes solutions on the corresponding Procrustes set.

Note that by using (13) we will always obtain a Procrustes solution, which is in general not a permutation matrix. In fact, a stability analysis shows that the optimal permutation matrix is hyperbolic with four unstable directions. We expect that those directions become stable as we enable the equality constraint in (11) and use the gradient flow (12). To this end, let us consider the flow (12). By using a forward difference scheme, we approximate the Jacobian in a small perturbation of the optimal permutation matrix $(\widetilde{P},\widetilde{\lambda})$ and compute the eigenvalues. The unstable directions become stable, but there are still four eigenvalues with positive (but small) real part. Note that the Lagrange multiplier, unfortunately, maps the optimal solution of the TSP to saddle points. Hence, although the optimal permutation matrix is still not stable, we expect the trajectories to stay in a small neighborhood of the optimal permutation matrix since the positive (unstable) eigenvalues have a small real part. Then we can extract the optimal permutation matrix.

Next we will numerically analyze how likely it is to obtain the optimal permutation matrix by using (12). Hence, we compute the basin of attraction (subsets of the stable manifold) of the optimal permutation matrix

[TABLE]

Again, we use an adaption of the set-oriented continuation technique by Dellnitz et al., where we integrate (12) backward in time. Therefore, we only have to change the time- $\tau$ -map $\Phi$ in Algorithm 1 to the one that corresponds to the new dynamical system shown in (12). We compute the small perturbations of the optimal permutation matrix by polar decomposition. We set $\bm{Q}=[-1,1]^{25}$ and choose $s=175$ and $\tau=10000$ as before. In Figure 7 (a)–(b) we show different three-dimensional projections of the basin of attraction of $(\widetilde{P},\widetilde{\lambda})$ . The dark cells depict the stationary solutions of the gradient flow (12) backward in time. There are four different stationary solutions which are also permutation matrices, two of which are shown below,

[TABLE]

Observe that the Procrustes sets of the gradient flow (13) are not covered by the basin of attraction. In fact, this is clear since we start in a local minimizer and we solve the gradient flow backward in time. This is equivalent to maximizing the cost function (11) subject to the equality constraints. Since the Procrustes solution is the global minimizer of the cost function without the constraints, it is, in general, not possible to find this solution via the gradient flow backward in time. Hence, it cannot be in the basin of attraction of the optimal solution of the TSP when using the dynamical system given by (12). The box-counting dimension (see, e.g., [HK99]) of the covering of the basin of attraction is $d\approx 3.5$ (about $24$ million cells in the $25$ -dimensional space).

In an effort to address the complications due to the infinite number of Procrustes solutions, we now reformulate the dynamical systems using tour matrix representations of the solutions.

3.3 Gradient flows for tour matrices

Instead of forcing the gradient flow to converge to a permutation matrix, an alternative approach is to define a cost function in such a way that the flow converges directly to a permutation of the initial tour matrix $T$ . For the symmetric case (9), Brockett [Bro91] introduces a change of variables, given by $H=P^{T}BP$ . The resulting double bracket flow, $\dot{H}=2\left[H,\left[H,A\right]\right]$ , then evolves in the space of symmetric matrices and is only quadratic in $H$ .

We now extend this to the nonsymmetric case. Assuming that the objective function $F(P)$ can be rewritten as a function $F(H)$ , we want to derive a gradient flow for the new variable $H$ . For the TSP, again $A=D$ and $B=T$ . Note that, as before, the $T$ matrix is replaced by $T_{\text{undir}}$ and $T_{\text{dir}}$ for symmetric and asymmetric TSP instances respectively. With the aforementioned transformation, the cost function for the relaxed TSP from Lemma 3.2 can be written as

[TABLE]

where $\mathcal{T}_{n}=\{P^{T}TP\mid P\in\mathcal{O}_{n}\}$ .

Remark 3.6.

For directed cycle graphs, i.e., $B=T_{\text{dir}}$ , the non-relaxed version of (15) is identical to the cost of the linear assignment problem (LAP) [KS18], with the difference that here the set of feasible solutions is constrained to a subset of $\mathcal{P}_{n}$ , which makes the problem NP-hard.

Theorem 3.7.

Let $F(H)$ be a given cost function, then the gradient flow is given by

[TABLE]

Proof.

Since $H=P^{T}BP$ , using (7) and (8) we obtain

[TABLE]

Applying the chain rule, this leads to

[TABLE]

where $J^{ij}\in\mathbb{R}^{n\times n}$ is a single-entry matrix [PP08], i.e., $(J^{ij})_{kl}=\delta_{ik}\delta_{jl}$ . It follows that

[TABLE]

Inserting this into the equation for $\dot{H}$ concludes the proof. ∎

For the cost function $F(H)=\operatorname{tr}(A^{T}H)$ , we simply obtain $F_{H}=A$ . With the aid of Theorem 3.7, we can then compute the corresponding gradient flow. In addition to the cost function, a penalty function has to be used to find an admissible solution.

One possibility would be to penalize negative entries as described in Section 3.2. However, for the tour matrix approach we use a combination of two penalty functions. The first penalty function is given by,

[TABLE]

which penalizes entries that are not zero or one. Furthermore, we also use a second penalty function

[TABLE]

where $A\in\mathbb{R}^{3n\times n^{2}},\ b\in\mathbb{R}^{3n}$ are linear equality constraints that force the flow to converge to a matrix $H$ with row and column sums equal to two and diagonal entries equal to zero. Given both penalty functions, we consider the following TSP as a constrained optimization problem,

[TABLE]

In order to solve (18), we will solve the resulting system of differential equations

[TABLE]

where $G_{1,H}=2(H-H\circ H)\circ(E-2H)$ and $G_{2,H}=\operatorname{vec}^{-1}\left(2(A\cdot\mbox{vec(H)}-b)^{T}A\right)$ . Here, $E$ denotes the matrix of ones. Again, we perform a gradient descent for the cost function and a gradient ascent for the Lagrange multipliers.

Example 3.8.

Let us illustrate the gradient flow with the same TSP with 10 cities as in Example 3.4. Using (19), we obtain the results shown in Figure 8. We plot only the entries of the $H$ matrix that are greater than zero. The dynamical system converges to a tour that is slightly longer than the optimal tour.

Analogously to Section 3.2, we will numerically analyze the tour matrix flow (19) for the simple TSP with five cities. We first note that there exists only one Procrustes solution in $\mathcal{T}_{n}$ , i.e., $H^{*}=(P^{*})^{T}TP^{*}$ , where $T=T_{\text{undir}}$ (since the distances between cities are symmetric). We note that the advantage of this formulation is that it avoids the infinite number of Procrustes solutions that were described in the previous section. In other words, even though there exist an infinite number of $P^{*}$ solutions, $H^{*}$ is unique. Furthermore, the Procrustes solution in $\mathcal{T}_{n}$ is invariant under basis rotations due to repeated eigenvalues. Hence, the corresponding Procrustes set is entirely captured by $H^{*}$ . In order to analyze the stability of the optimal tour matrix, we take a small perturbation of the optimal tour and solve the tour matrix flow (19). This results in a matrix $\widetilde{H}$ with the corresponding Lagrange multipliers $\widetilde{\lambda}_{1}$ and $\widetilde{\lambda}_{2}$ . Observe that $\widetilde{H}$ only lies in a small neighborhood of the optimal tour. Now we are in a position to compute the Jacobian in $(\widetilde{H},\widetilde{\lambda}_{1},\widetilde{\lambda}_{2})$ using a forward difference scheme. There exist five eigenvalues with positive (but small) real part, thereby giving rise to saddle points. Thus, the optimal tour is again not stable, but we do expect that by using the tour flow (19) we will stay in a small neighborhood of the optimal tour.

Finally, we again compute the basin of attraction of the optimal tour matrix for the five cities example,

[TABLE]

Observe that a symmetric matrix $H\in\mbox{Symm}_{5}=\{A\in\mathbb{R}^{5\times 5}\,|\,A=A^{T}\}$ is fully described by the lower left or upper right part of the matrix. Moreover, as described previously, the tour matrix flow (19) is a matrix differential equation evolving on the manifold of $\mbox{Symm}_{5}$ . Thus, starting with a symmetric matrix $H\in\mbox{Symm}_{5}$ , the trajectory remains for all time in $\mbox{Symm}_{5}$ . This allows us to choose $\bm{Q}=[-2,2]^{15}$ and a fine partition $\bm{L}_{s}$ of $\bm{Q}$ for $s=105$ . Again, we set $\tau=10000$ and make use of Remark 3.5 c). Small perturbations of the optimal tour matrix $\bar{H}$ are computed as follows: Let $C\in\bm{L}_{105}$ be the cell which contains the components of the optimal tour matrix $\bar{H}$ . A point $x\in C$ defines the components of a lower triangular matrix. By taking the lower left part of $\widetilde{\operatorname{vec}}^{-1}(x)$ , we can directly create a symmetric matrix, which is a small perturbation of the optimal tour. Observe that $\widetilde{\operatorname{vec}}^{-1}$ has to be adapted accordingly.

In Figure 9 (a)–(b) we show different three-dimensional projections of the basin of attraction of $(\widetilde{H},\widetilde{\lambda}_{1},\widetilde{\lambda}_{2})$ . The dark cells depict the stationary solutions of (19) backward in time. We note that there are five different tour matrices to which the gradient flow converges backward in time, two of the five unique matrices (each one depending on a permutation matrix with a unique cycle) are shown below,

[TABLE]

Furthermore, the box-counting dimension of the basin of attraction is about ${d\approx 3.28}$ .

Although the flows described above are interesting and display the complexity and challenges presented by NP-hard problems from a dynamical system lens, we find that the solutions computed by the above methods are not competitive when compared to state-of-the-art methods. Our goal in the next section is to construct new variants of existing state-of-the-art methods that are inspired by our work above.

4 Procrustes-based Lin–Kernighan heuristic

In this section, we will propose a method to compute candidate sets based on the relaxed problem (5) or (15), respectively. The solution, which is given by the solution of the Procrustes problem (see section 3.1), can be computed analytically. Note that the solutions computed using this approach are optimal solutions of flows (for $P$ and $H$ ) described in previous sections.

Thus, the optimal solution which minimizes the relaxed TSP cost function (5) can be obtained using the solution of the Procrustes problem described in Section 3.1. Namely, the solution is given in terms of matrix $P^{*}=V_{T}V_{D}^{T}$ [AW00, HRW92], where the eigenvectors in the two matrices are sorted with respect to increasing eigenvalues of $T$ and decreasing eigenvalues of $D$ or vice versa. Define $T^{*}=P^{*T}TP^{*}=V_{D}\Lambda_{T}V_{D}^{T}$ to be the solution of the two-sided orthogonal Procrustes problem or the minimum of the tour flow. Note that $T^{*}$ is also the optimal solution of the gradient flow (19) without both equality constraints, i.e.,

[TABLE]

Roughly speaking, $T^{*}$ can be interpreted as a continuous solution of the relaxed TSP where the entry $t_{ij}^{*}$ describes the strength of edge $(i,j)$ . We would like to now use the entries of $T^{*}$ to help inform the Lin–Kernighan heuristic. In particular, we use the solution of the Procrustes problem to limit the search space and to improve the efficiency of local heuristics. The aim is to bias $k$ -opt moves in a manner such that edges with high edge strengths are included with high probability. We compute candidate sets based on the entries $t^{*}_{ij}$ of the matrix $T^{*}$ and call these matrix values $P$ -nearness. In our proposed approach, for each city, we pick the cities with the largest entries $t_{ij}^{*}$ .

Example 4.1.

Let us consider again the TSP from Examples 3.4 and 3.8. Figure 10 shows the difference between the optimal solution of the TSP and the optimal solution of the Procrustes problem. We use a linear interpolation between blue (large $t_{ij}^{*}$ value) and white (small $t_{ij}^{*}$ value). Note that this is the same matrix as in Figure 8d, with the difference that we are plotting a few more edges here to illustrate $P$ -nearness. Clearly, some edges of the optimal tour are already visible in Figure 10b, for example $(2,9)$ and $(5,9)$ , while other edges such as $(3,10)$ or $(2,5)$ have a much lower weight. For city $10$ , different choices exist, $(2,10)$ , $(6,10)$ , $(7,10)$ , and $(8,10)$ , for instance, have a high probability of being part of the shortest tour.

In Figure 11, we show the edges computed using the Procrustes solution. In particular, for two random TSP instances ( $50$ city and $100$ city examples), we show the shortest edges in the left-most column and the edges from the Procrustes solution in the right-most column. The optimal tour is plotted in the middle column. It is evident from the figure that the Procrustes solution $T^{*}$ tends to capture most of the edges in the optimal tour. Note that, while the $\alpha$ -nearness values can be computed in $\mathrm{O}(n^{2})$ [HK70, Hel98, Hel06], the computation of the $P$ -nearness values is $\mathrm{O}(n^{3})$ .

4.1 Improving the Procrustes solution

Analogous to the $\alpha$ -nearness approach, we now describe our methodology to obtain better $P$ -nearness values than those computed from the solution of the Procrustes problem alone. As described in section 2.3, the $\alpha$ -nearness values were improved using subgradient optimization. In the $P$ -nearness setting, the principal idea is to construct a homotopy between the original TSP distance matrix $D$ and the solution of the Procrustes solution. Intuitively, one desires the candidate sets to include ‘several’ short edges and a ‘few’ long edges. Note that simply picking the shortest edges from $D$ gives rise to greedy solutions that are usually not competitive since they require the addition of long edges to complete the tours [Coo11].

We find that the Procrustes solution $T^{*}$ tends to select too many long edges (as shown in Figure 12a). If we use the entries of $T^{*}$ to bias the Lin–Kernighan heuristic, the solutions are found to be close, but less competitive than the standard approach. An effort to reduce the number of long edges is equivalent to making the computed solution “greedy” by picking edges based on the distance matrix $D$ . Thus, we construct a homotopy $\tilde{H}$ of the form,

[TABLE]

The candidate sets for varying $\lambda$ are shown for an example TSP instance in Figure 12. To find the optimal $\lambda$ we use ideas from graph clustering [SSB12], where one computes the existence of disconnected clusters in the graph. In particular, we increase $\lambda$ until the graph of candidate sets is almost disconnected (separated into clusters). This optimal $\lambda$ is found by either marching in $\lambda$ or using a bisection approach.

There are multiple ways that one can compute the connectedness of a graph. In particular, one can use a depth-first search based approach [Cor09] or perform computations on the graph Laplacian [Chu97, Fie73, Fie89]. The rank of the graph Laplacian matrix is related to the number of connected components in the graph [Chu97]. In our work, we pick the graph Laplacian approach for computing connected components in the graph (by looking at the multiplicity of the zero eigenvalue). Note that these computations can also be performed in the distributed setting [SSB10, SSB12]. If varying $\lambda$ does not give rise to a disconnected graph, we set $\lambda=1$ . Alternatively, one can use the $D$ matrix (in place of $T^{*}-\lambda D$ ) to bias the $k$ -opt moves in the Lin–Kernighan heuristic. If the candidate sets based on distance only are connected, this typically implies that the $k$ -opt moves converge quickly to the shortest tour.

5 Results

To compare different candidate sets or methods to bias $k$ -opt, Helsgaun computes the “average rank” of the edges which form the shortest tour [Hel98]. The ranking is essentially an ordering on set of edges for each node. This ordering captures the “likelihood” of an edge being in the optimal tour and is typically computed using the $\alpha$ -nearness values. In our proposed approach, the $\alpha$ -nearness values are replaced by $P$ -nearness values. Thus, the optimal average rank is $1.5$ , all edges belonging to the shortest tour have either rank $1$ or $2$ . We found that the average rank is in general not a good metric for the quality of the nearness values or candidate sets. Although the average rank of the Procrustes solution is typically much higher than the average rank of the $\alpha$ -nearness values, $k$ -opt often converges faster to the shortest tour.

In order to compare $\alpha$ -nearness and $P$ -nearness, we compute tours using Helsgaun’s LKH package. For each LKH run, we generate the candidate sets based on the $\alpha$ -nearness and $P$ -nearness values. Starting from initial tours computed using $\alpha$ -nearness and $P$ -nearness, respectively, we compare the resulting tour lengths after a fixed number of $k$ -opt steps.

In Table 1, we compare $22$ well-known instances of the TSP from the TSPLIB database [Rei91]. The size of the candidate sets in these computations is fixed, we compute $5$ candidates for each city using $\alpha$ -nearness or $P$ -nearness values, respectively. Starting from a random initial tour that is generated from the respective candidate sets, we perform a fixed number of $8n$ $k$ -opt moves, where $n$ is the number of cities. We find that in this setting, the $P$ -nearness based approach typically converges faster than $\alpha$ -nearness. For example, after $8n$ steps, $P$ -nearness based LKH converges to lower cost values in $18$ of the instances when compared to $\alpha$ -nearness based LKH. Moreover, we ran $50$ random TSP instances of size $1000$ (cities) and found that $P$ -nearness had lower tour costs after a fixed number of $k$ -opt moves in $31$ of the instances, hence resulting in better solutions in $62\%$ of the instances. Note that if we run both, $\alpha$ -nearness and $P$ -nearness based LKH, to convergence, both methods compute the best known optimal tours in these instances. Since the initial tours are constructed using the candidate sets, the starting costs may occasionally differ slightly when comparing $\alpha$ -nearness with $P$ -nearness.

We do not present runtime results comparing the two algorithms since our prototype code was implemented in MATLAB and is consequently unable to compete with LKH (implemented in C) in speed. Our MATLAB implementation also limits the size of the TSP instances that we can handle. In future work, we intend to re-implement our algorithm in C for greater scalability and performance. Moreover, our approach will have higher computational cost than $\alpha$ -nearness based methods due to the $O(n^{3})$ eigenvector computations. However, given that the $P$ -nearness requires fewer iterations of the Lin–Kernighan heuristic, we conjecture that by combining our approach with fast spectral methods [SSO17], one can construct a highly competitive TSP approach.

6 Conclusion and future work

In this work, we explored the use of continuous relaxations and dynamical systems theory for constructing algorithms for the TSP. Our approach aimed to exploit the observation that the solution of the TSP can be represented as a permutation matrix which lies on the manifold of orthogonal matrices. In the first part of this manuscript, we constructed a dynamical system on the manifold of orthogonal matrices that converges to solutions of the TSP. We also explored the construction of gradient flows for tour matrices in Section 3.3. We found that although the dynamical systems approach is elegant and sheds light on the structure and complexity of NP-hard problems, it often converges to local optima. We also used homotopy continuation methods to compute subsets of the stable manifolds of the optimal solutions.

Inspired by the dynamical systems approach, we then exploited a Procrustes based approximation that computes an orthogonal matrix that minimizes the TSP cost. Our approach was based on the computation of the solution of the two-sided orthogonal Procrustes problem which is based on the eigendecomposition of the corresponding tour and distance matrices of the TSP instance. We then constructed a homotopy of the Procrustes solution with the distance matrix that is then used to bias the popular Lin–Kernighan heuristic.

In certain TSP instances, the candidate sets constructed from the homotopy are found to give faster convergence than minimum spanning tree ( $1$ -tree) based approaches. Our algorithm was implemented in the LKH software framework and demonstrated on multiple TSPLIB and random TSP instances.

Future work includes the testing of the Procrustes approach on larger instances of the TSP by exploiting parallel eigenvector computation packages [BCC*+*97]. On the theoretical side, we aim to pursue the generalization of our proposed approach to the quadratic assignment problem (QAP). The aim is to develop efficient heuristics utilizing the results of the Procrustes problem and dynamical systems theory for solving strongly NP-hard problems. We are also exploring the use of subgradient optimization for improving the Procrustes solution which is expected to provide faster convergence rates for the TSP and related optimization problems. Additionally, we are exploring the use of fast spectral methods [SSO17] for accelerating the candidate set computations. Moreover, we hope that this work increases interest in the area at the intersection of dynamical systems theory and combinatorial optimization. There appear to be deep connections between the two areas [WWJ16] that may enable the construction of new optimization algorithms for a wide class of optimization problems.

Acknowledgments

The authors thank Prof. Keld Helsgaun for discussions related to the Lin–Kernighan heuristic and his software and also Dr. Mirko Hessel-von Molo and Steffen Ridderbusch for discussions related to the approach. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) and Space and Naval Warfare Systems Center, Pacific (SSC Pacific) under Contract No. N6600118C4031.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AAM + 00] R. Agarwala, D.L. Applegate, D. Maglott, G.D. Schuler, and A.A. Schäffer. A fast and scalable radiation hybrid map construction and integration strategy. Genome Research , 10(3):350–364, 2000.
2[ABCC 98] D. Applegate, R. Bixby, W. Cook, and V. Chvátal. On the solution of traveling salesman problems . Rheinische Friedrich-Wilhelms-Universität Bonn, 1998.
3[AW 00] K. Anstreicher and H. Wolkowicz. On Lagrangian relaxation of quadratic matrix constraints. SIAM Journal on Matrix Analysis and Applications , 22(1):41–55, 2000.
4[BCC + 97] L Susan Blackford, Jaeyoung Choi, Andy Cleary, Eduardo D’Azevedo, James Demmel, Inderjit Dhillon, Jack Dongarra, Sven Hammarling, Greg Henry, Antoine Petitet, et al. Sca LAPACK users’ guide . SIAM, 1997.
5[Bc PP 98] R. E. Burkard, E. Çela, P. M. Pardalos, and L. S. Pitsoulis. The quadratic assignment problem. In P. M. Pardalos and D.-Z. Du, editors, Handbook of Combinatorial Optimization , pages 1713–1809. Springer, 1998.
6[B Hv M 09] Armin Biere, Marijn Heule, and Hans van Maaren. Handbook of satisfiability , volume 185. IOS press, 2009.
7[Bro 89] R. W. Brockett. Least squares matching problems. Linear Algebra and its Applications , 122–124:761–777, 1989.
8[Bro 91] R. W. Brockett. Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. Linear Algebra and Its Applications , 146:79–91, 1991.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Continuous Relaxations for the Traveling Salesman Problem

Abstract

1 Introduction

2 The traveling salesman problem

2.1 The Lin–Kernighan heuristic

2.2 Candidate sets and α\alphaα–nearness

Definition 2.1**.**

2.3 Subgradient optimization

3 Dynamical systems approach

3.1 The two-sided orthogonal Procrustes problem

Theorem 3.1**.**

3.2 Gradient flows for orthogonal matrices

Lemma 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Example 3.4**.**

Remark 3.5**.**

3.3 Gradient flows for tour matrices

Remark 3.6**.**

Theorem 3.7**.**

Proof.

Example 3.8**.**

4 Procrustes-based Lin–Kernighan heuristic

Example 4.1**.**

4.1 Improving the Procrustes solution

5 Results

6 Conclusion and future work

Acknowledgments

2.2 Candidate sets and $\alpha$ –nearness

Definition 2.1.

Theorem 3.1.

Lemma 3.2.

Lemma 3.3.

Example 3.4.

Remark 3.5.

Remark 3.6.

Theorem 3.7.

Example 3.8.

Example 4.1.