Engineering Kernelization for Maximum Cut

Damir Ferizovic; Demian Hespe; Sebastian Lamm; Matthias Mnich; and Christian Schulz; Darren Strash

arXiv:1905.10902·cs.DS·May 28, 2019

Engineering Kernelization for Maximum Cut

Damir Ferizovic, Demian Hespe, Sebastian Lamm, Matthias Mnich, and Christian Schulz, Darren Strash

PDF

TL;DR

This paper develops and tests new kernelization data reduction rules for the Max Cut problem, significantly improving solver performance on benchmark instances and enabling solutions to previously unsolvable large networks.

Contribution

The authors engineer a comprehensive set of efficient kernelization rules for Max Cut and demonstrate their practical effectiveness on diverse benchmark datasets.

Findings

01

Speedups of up to multiple orders of magnitude in solver runtimes.

02

Successfully solved four previously unsolvable instances within a 10-hour limit.

03

Significant improvements on synthetic, VLSI, image segmentation, social, and biological network datasets.

Abstract

Kernelization is a general theoretical framework for preprocessing instances of NP-hard problems into (generally smaller) instances with bounded size, via the repeated application of data reduction rules. For the fundamental Max Cut problem, kernelization algorithms are theoretically highly efficient for various parameterizations. However, the efficacy of these reduction rules in practice---to aid solving highly challenging benchmark instances to optimality---remains entirely unexplored. We engineer a new suite of efficient data reduction rules that subsume most of the previously published rules, and demonstrate their significant impact on benchmark data sets, including synthetic instances, and data sets from the VLSI and image segmentation application domains. Our experiments reveal that current state-of-the-art solvers can be sped up by up to multiple orders of magnitude when…

Tables4

Table 1. Table 1 : Reduction rules from previous work subsumed by our new rules. A ✓ in row a 𝑎 a and column b 𝑏 b means that the rule from row a 𝑎 a subsumes the rule from column b 𝑏 b . If there are multiple ✓s in a column (say, rows a 𝑎 a and b 𝑏 b in column c 𝑐 c ), then rules a 𝑎 a and b 𝑏 b combined subsume rule c 𝑐 c .

Source	[18]	[11]			[10]		[17]	[29]
Rule	A	5	6	7	8	9	9	6	7	8	9	10	11	12	13
1			✓		✓
1	✓	✓		✓		✓	✓	✓	✓	✓	✓			✓	✓
1								✓		✓	✓	✓	✓
1			✓

Table 2. Table 2 : Impact of kernelization on the computation of a maximum cut by LocalSolver (LS) and Biq Mac (BM). Times are given in seconds. Kernelization is accounted for within the timings for G ker subscript 𝐺 ker G_{\textnormal{ker}} . Values in brackets provide the speedup and are derived from T ( G ) T ( G ker ) 𝑇 𝐺 𝑇 subscript 𝐺 ker \frac{T(G)}{T(G_{\textnormal{ker}})} . Times labeled with “ − - ” exceeded the ten-hour time limit and an “f” indicates the solver crashed.

Name	$\| V \|$	$e (G)$	$T_{LS} (G)$	$T_{LS} (G_{ker})$		$T_{BM} (G)$	$T_{BM} (G_{ker})$
ca-CSphd	1 882	0.99	24.07	0.32	[75.40]	-	0.06	[ $\infty$ ]
ego-facebook	2 888	1.00	20.09	0.09	[228.91]	-	0.01	[ $\infty$ ]
ENZYMES_g295	123	0.86	1.22	0.33	[3.70]	0.82	0.13	[6.57]
road-euroroad	1 174	0.79	-	-	-	-	-	-
bio-yeast	1458	0.81	-	-	-	-	32 726.75	[ $\infty$ ]
rt-twitter-copen	761	0.85	-	834.71	[ $\infty$ ]	-	1.77	[ $\infty$ ]
bio-diseasome	516	0.93	-	4.91	[ $\infty$ ]	-	0.07	[ $\infty$ ]
ca-netscience	379	0.77	-	956.03	[ $\infty$ ]	-	0.67	[ $\infty$ ]
soc-firm-hi-tech	33	0.36	4.67	1.61	[2.90]	0.09	0.06	[1.41]
g000302	317	0.21	0.58	0.49	[1.17]	1.88	0.74	[2.53]
g001918	777	0.12	1.47	1.41	[1.04]	31.11	17.45	[1.78]
g000981	110	0.28	10.73	4.73	[2.27]	531.47	21.53	[24.68]
g001207	84	0.19	1.10	0.16	[6.88]	53.20	0.06	[962.38]
g000292	212	0.03	0.45	0.45	[1.01]	0.43	0.37	[1.14]
imgseg_271031	900	0.99	10.66	0.19	[55.94]	-	0.17	[ $\infty$ ]
imgseg_105019	3 548	0.93	234.01	22.68	[10.32]	f	13 748.62	[ $\infty$ ]
imgseg_35058	1 274	0.37	34.93	24.71	[1.41]	-	-	-
imgseg_374020	5 735	0.82	1 739.11	72.23	[24.08]	f	-	-
imgseg_106025	1 565	0.68	159.31	34.05	[4.68]	-	-	-

Table 3. Table 3 : Impact of kernelization on the computation of a maximum cut by LocalSolver (LS) and Biq Mac (BM). Times are given in seconds. Kernelization time is included in the solving times for G ker subscript 𝐺 ker G_{\textnormal{ker}} . Values in brackets provide the speedup and are derived from T ( G ) T ( G ker ) 𝑇 𝐺 𝑇 subscript 𝐺 ker \frac{T(G)}{T(G_{\textnormal{ker}})} . Times labeled with “ − - ” exceeded the ten-hour time limit. Weighted path compression by Reduction Rule 1 is not used at the end – the kernel is unweighted.

Name	$\| V \|$	$e (G)$	$T_{LS} (G)$	$T_{LS} (G_{ker})$		$T_{BM} (G)$	$T_{BM} (G_{ker})$
ca-CSphd	1 882	0.98	24.79	1.12	[22.23]	-	0.32	[ $\infty$ ]
ego-facebook	2 888	0.93	20.39	1.72	[11.83]	967.99	1.42	[682.04]
ENZYMES_g295	123	0.82	1.83	0.36	[5.09]	0.96	0.37	[2.60]
road-euroroad	1 174	0.69	-	-	-	-	-	-
bio-yeast	1 458	0.72	-	-	-	-	-	-
rt-twitter-copen	761	0.80	-	409.47	[ $\infty$ ]	-	101.14	[ $\infty$ ]
bio-diseasome	516	0.93	-	6.66	[ $\infty$ ]	-	0.35	[ $\infty$ ]
ca-netscience	379	0.67	-	4 116.61	[ $\infty$ ]	-	2.10	[ $\infty$ ]
soc-firm-hi-tech	33	0.30	4.92	2.34	[2.10]	0.29	0.31	[0.94]
g000302	317	0.10	0.71	0.50	[1.41]	1.28	0.89	[1.44]
g001918	777	0.06	1.67	1.51	[1.10]	14.90	11.69	[1.27]
g000981	110	0.22	11.32	1.97	[5.74]	0.98	0.44	[2.23]
g001207	84	0.17	1.56	0.15	[10.11]	0.47	0.37	[1.28]
g000292	212	0.01	0.69	0.51	[1.35]	0.56	0.62	[0.91]

Table 4. Table 4 : Evaluation of large graph instances. A three-hour time limit was used and five iterations were performed. The columns Δ LS subscript Δ LS \Delta_{\textnormal{LS}} and Δ MQ subscript Δ MQ \Delta_{\textnormal{MQ}} indicate the percentage by which the size of the largest computed cut is larger on the kernelized graph compared to the non-kernelized one, for LocalSolver and MqLib , respectively.

Name	$\| V \|$	$d e g_{avg}$	$e (G)$	$T_{ker} (G)$	$Δ_{LS}$	$Δ_{MQ}$
inf-road_central	14 081 816	1.20	0.59	362.32	inf%	2.70%
inf-power	4 941	1.33	0.62	0.04	1.64%	0.45%
web-google	1 299	2.13	0.79	0.01	0.69%	0.19%
ca-MathSciNet	332 689	2.47	0.63	8.02	1.33%	0.55%
ca-IMDB	896 305	4.22	0.42	27.55	0.97%	0.32%
web-Stanford	281 903	7.07	0.18	105.17	0.34%	0.30%
web-it-2004	509 338	14.09	0.91	22.10	0.08%	0.02%
ca-coauthors-dblp	540 486	28.20	0.25	72.39	0.05%	0.04%

Equations9

β (G) - β (G^{'})

β (G) - β (G^{'})

= \frac{∣ E ( G ) ∣}{2} + \frac{∣ V ( G ) ∣ - 1}{4} - (\frac{∣ E ( G ^{'} ) ∣}{2} + \frac{∣ V ( G ^{'} ) ∣ - 1}{4})

= \frac{∣ E ( G ) ∣}{2} + \frac{∣ V ( G ) ∣ - 1}{4} - (\frac{∣ E ( G ) ∣ - ∣ N _{G} ( x _{1} ) ∣ - ( ∣ N _{G} ( x _{2} ) ∣ - 1 )}{2} - \frac{( ∣ V ( G ) ∣ - 2 ) - 1}{4})

= \frac{( ∣ V ( G ) ∣ - 1 ) - ∣ V ( G ) ∣ + 2 + 1}{4} - \frac{- ∣ N _{G} ( x _{1} ) ∣ - ( ∣ N _{G} ( x _{2} ) ∣ - 1 )}{2}

= \frac{2}{4} - \frac{- ∣ N _{G} ( x _{1} ) ∣ - ∣ N _{G} ( x _{1} ) ∣ + 1}{2}

= \frac{2}{4} - \frac{- 2∣ N _{G} ( x _{1} ) ∣ + 1}{2}

= \frac{2}{4} - \frac{1}{2} + ∣ N_{G} (x_{1}) ∣

= ∣ N_{G} (x_{1}) ∣.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Engineering Kernelization for Maximum Cut111The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement no. 340506.

Damir Ferizovic222Karlsruhe Institute of Technology, Karlsruhe, Germany, [email protected]

Demian Hespe333Karlsruhe Institute of Technology, Karlsruhe, Germany, [email protected]

Sebastian Lamm444Karlsruhe Institute of Technology, Karlsruhe, Germany, [email protected]

Matthias Mnich555Universität Bonn, Bonn, Germany, [email protected], Supported by DFG grant MN 59/1-1.

Christian Schulz666University of Vienna, Faculty of Computer Science, Vienna, Austria, [email protected]

Darren Strash777Hamilton College, Clinton, New York, USA, [email protected]

Abstract

Kernelization is a general theoretical framework for preprocessing instances of $\mathsf{NP}$ -hard problems into (generally smaller) instances with bounded size, via the repeated application of data reduction rules. For the fundamental Max Cut problem, kernelization algorithms are theoretically highly efficient for various parameterizations. However, the efficacy of these reduction rules in practice—to aid solving highly challenging benchmark instances to optimality—remains entirely unexplored.

We engineer a new suite of efficient data reduction rules that subsume most of the previously published rules, and demonstrate their significant impact on benchmark data sets, including synthetic instances, and data sets from the VLSI and image segmentation application domains. Our experiments reveal that current state-of-the-art solvers can be sped up by up to multiple orders of magnitude when combined with our data reduction rules. On social and biological networks in particular, kernelization enables us to solve four instances that were previously unsolved in a ten-hour time limit with state-of-the-art solvers; three of these instances are now solved in less than two seconds.

1 Introduction

The (unweighted) Max Cut problem is to partition the vertex set of a given graph $G=(V,E)$ into two sets $S\subseteq V$ and $V\setminus S$ so as to maximize the total number of edges between those two sets. Such a partition is called a maximum cut. Computing a maximum cut of a graph is a well-known problem in the area of computer science; it is one of Karp’s 21 $\mathsf{NP}$ -complete problems [26] While signed and weighted variants are often considered throughout the literature [4, 5, 6, 9, 13, 23, 24], the simpler (unweighted) case still presents a significant challenge for researchers, and solving it quickly is of paramount importance to all variants. Max Cut variants have many applications, including social network modeling [23], statistical physics [4], portfolio risk analysis [24], VLSI design [6, 9], network design [5], and image segmentation [13].

Theoretical approaches to solving Max Cut primarily focus on producing efficient parameterized algorithms through data reduction rules, which reduce the input size in polynomial time while maintaining the ability to compute an optimal solution to the original input. If the resulting (irreducible) graph has size bounded by a function of a given parameter, then it is called a kernel. Recent works focus on parameters measuring the distance $k$ between the maximum cut size of the input graph and a lower bound $\ell$ guaranteed for all graphs. The algorithm then must decide if the input graph admits a cut of size $\ell+k$ for a given integer $k\in\mathbb{N}$ . Two such lower bounds are the Edwards-Erdős bound [15, 16] and the spanning tree bound. Crowston et al. [11] were the first to show that unweighted Max Cut is fixed-parameter tractable when parameterized by distance $k$ above the Edwards-Erdős bound. Moreover, they show the problem admits a polynomial-size kernel with $O(k^{5})$ vertices. Their result was extended to the more general Signed Max Cut problem, and the kernel size was decreased to $O(k^{3})$ vertices [10]. Finally, Etscheid and Mnich [17] improved the kernel size to an optimal $O(k)$ vertices even for signed graphs, and showed how to compute it in linear time $O(k\cdot(|V|+|E|))$ .

Many practical approaches exist to compute a maximum cut or (alternatively) a large cut. Two state-of-the-art exact solvers are Biq Mac (a solver for binary quadratic and Max-Cut problems) by Rendl et al. [31], and LocalSolver [8, 22], a powerful generic local search solver that also verifies optimality of a cut. Many heuristic (inexact) solvers are also available, including those using unconstrained binary quadratic optimization [35], local search [7], tabu search [27], and simulated annealing [3].

Curiously, data reduction, which has shown promise at preprocessing large instances of other fundamental $\mathsf{NP}$ -hard problems [2, 25, 28], is currently not used in implementations of Max Cut solvers. To the best of our knowledge, no research has been done on the efficiency of data reduction for Max Cut, in particular with the goal of achieving small kernels in practice.

Our Results. We introduce new data reduction rules for the Max Cut problem, and show that nearly all previous reduction rules for the Max Cut problem can be encompassed by only four reduction rules. Furthermore, we engineer efficient implementations of these reduction rules and show through extensive experiments we show that kernelization achieves a significant reduction on sparse graphs. Our experiments reveal that current state-of-the-art solvers can be sped up by up to multiple orders of magnitude when combined with our data reduction rules. We achieve speedups on all instances tested. On social and biological networks in particular, kernelization enables us to solve four instances that were previously unsolved in a ten-hour time limit with state-of-the-art solvers; three of these instances are now solved in less than two seconds with our kernelization.

2 Preliminaries

Throughout this paper, we consider finite, simple and undirected graphs $G=(V,E)$ together with additive edge weight functions $\omega:E\to\mathbb{R}_{>0}$ . For each vertex $v\in V$ let $N(v):=\{u\in V\mid\{v,u\}\in E\}$ denote its neighbors; its degree in $G$ is $\mathsf{deg}(v):=|N(v)|$ . The neighborhood of a set $X\subseteq V$ is $N(X):=\bigcup_{v\in X}N(v)\setminus X$ . For a vertex set $S\subseteq V$ , let $G[S]$ denote the subgraph of $G$ induced by $S$ . To specify the vertex and edge sets of a specific graph $G$ , we use $V(G)$ and $E(G)$ , respectively. The set of edges between the vertices of different vertex sets $S_{1},S_{2}\subseteq V$ is written as $E(S_{1},S_{2}):=E\cap(S_{1}\times S_{2})$ .

For an integer $\ell$ , a path of length $\ell$ in $G$ is a sequence $P=\langle v_{1},\ldots,v_{\ell+1}\rangle$ of distinct vertices such that $\{v_{i},v_{i+1}\}\in E(G)$ for $i=1,\ldots,\ell$ . A path with $v_{1}=v_{\ell}$ is called a cycle of $G$ . Graph $G$ is connected if there is a path from $v$ to $w$ for any pair $\{v,w\}$ of distinct vertices in $G$ ; and disconnected otherwise. A connected component of $G$ is an inclusion-maximal connected subgraph of $G$ . For vertex sets $S\subseteq V(G)$ , the set of external vertices is $C_{\textnormal{ext}}(S)=\{v\in S\mid\exists w\in V(G)\setminus S,\{v,w\}\in E(G)\}$ , which is the set of vertices in $S$ that have some neighbor in $G$ outside $S$ . In similar fashion, $C_{\textnormal{int}}(S)=S\setminus C_{\textnormal{ext}}(S)$ defines the set of internal vertices.

A clique is a complete subgraph, and a near-clique is a clique minus a single edge. A clique tree is a connected graph whose biconnected components are cliques, and a clique forest is a graph whose connected components are clique trees. In such graphs, we use the term block to refer to a biconnected component, bridge, or isolated vertex. The class of clique-cycle forests is defined as follows. A clique is a clique-cycle forest, and so is a cycle. The disjoint union of two clique-cycle forests is a clique-cycle forest. In addition, a graph formed from a clique-cycle forest by identifying two vertices, each from a different (connected) component, is also a clique-cycle forest.

The Max Cut problem is to find a vertex set $S\subseteq V$ , such that $|E(S,V\setminus S)|$ is maximized. We denote the cardinality of a maximum cut by $\beta(G)$ . At times, we may need to reason about a maximum cut given a fixed partitioning of a subset of $G$ ’s vertices. A partition of vertices $V^{\prime}\subseteq V(G)$ is given as a $2$ -coloring $\delta:V^{\prime}\rightarrow\{0,1\}$ . We let $\beta_{\delta}(G)$ denote the size of a maximum cut of $G$ , given that $V^{\prime}\subseteq V(G)$ is partitioned according to $\delta$ . The Weighted Max Cut problem is to find a vertex set $S$ of a given graph $G$ with additive weight function $\omega$ such that $\omega(E(S,V(G)\setminus S))$ is maximum. The weight of a maximum cut is then given by $\beta(G,\omega):=\omega(E(S,V\setminus S))$ . We denote instances of the Max Cut decision problem as $(G,k)_{\textnormal{MC}}$ , where $G$ is a graph and $k\in\mathbb{N}_{0}$ , If the size of a maximum cut in $G$ is $k$ , then $(G,k)_{\textnormal{MC}}$ is a “yes”-instance; otherwise, it is a “no”-instance.

We address two more variations Max Cut in this paper. The Vertex-Weighted Max Cut problem takes as input a graph $G$ and two vertex weight functions $\omega_{0},\omega_{1}:V(G)\rightarrow\mathbb{R}$ ; the objective is to compute a bipartition $V_{0}\cup V_{1}=V(G)$ that maximizes $|E(V_{0},V_{1})|+\sum_{v\in V_{0}}w_{0}(v)+\sum_{v\in V_{1}}w_{1}(v)$ . The Signed Max Cut problem takes as input a graph $G$ together with an edge labeling $l:E(G)\rightarrow\{``\scalebox{1.0}[1.0]{$ + $}",``\scalebox{0.75}[1.0]{$ - $}"\}$ ; the goal to find an $S\subseteq V(G)$ which maximizes the quantity $\beta(G,l):=|E_{l}^{-}(S,V(G)\setminus S)|+|E^{+}(G[S],l)\cup E^{+}(G[V(G)\setminus S],l)|$ , where $E_{l}^{c}(S,V(G)\setminus S):=\{e\in E(S,V(G)\setminus S)\mid l(e)=c\}$ and $E^{c}(l):=\{e\in E(G)\mid l(e)=c\}$ for $c\in\{``\scalebox{0.75}[1.0]{$ - $}",``\scalebox{1.0}[1.0]{$ + $}"\}$ . Similarly, for the neighborhood of a vertex (set), we use the notations $N^{c}_{l}(v):=\{w\in V(G)\mid\{v,w\}\in E^{c}(l)\}$ and $N^{c}_{l}(X):=\bigcup_{v\in X}N^{c}_{l}(v)\setminus X$ . We call a triangle positive if its number of “

$-$

”-edges is even. Any Max Cut instance can be transformed into a Signed Max Cut instance by labeling all edges with “

$-$

”.

Let $\Sigma^{*}$ denote the set of input instances for a decision problem. A parameterized problem $\Pi\subseteq\Sigma^{*}\times\mathbb{N}$ is fixed-parameter tractable if there is an algorithm $\mathcal{A}$ (called a fixed-parameter algorithm) that decides membership in $\Pi$ for any input pair $(x,k)\in\Sigma^{*}\times\mathbb{N}$ in time $f(k)\cdot|(x,k)|^{O(1)}$ for some computable function $f:\mathbb{N}\rightarrow\mathbb{N}$ .

A data reduction rule (often shortened to reduction rule) for a parameterized problem $\Pi$ is a function $\phi:\Sigma^{*}\times\mathbb{N}\rightarrow\Sigma^{*}\times\mathbb{N}$ that maps an instance $(x,k)$ of $\Pi$ to an equivalent instance $(x^{\prime},k^{\prime})$ of $\Pi$ such that $\phi$ is computable in time polynomial in $|x|$ and $k$ . We call two instances of $\Pi$ equivalent if either both or none belong to $\Pi$ . Observe that for two equivalent “yes”-instances $(G,\beta(G))$ and $(G^{\prime},\beta(G^{\prime}))$ , the relationship $\beta(G)=\beta(G^{\prime})+k$ holds for some $k\in\mathbb{Z}$ .

2.1 Related Work

Several studies have been made in the direction of providing fixed-parameter algorithms for the Max Cut problem [10, 11, 17, 29]. Among these, a fair amount of kernelization rules have been introduced with the goal of effectively reducing Max Cut instances [10, 11, 17, 29, 30, 18]. Those reductions typically have some constraints on the subgraphs, like being clique forests or clique-cycle forest. Later, we propose a new set of reductions that does not need this property and cover most of the known reductions [11, 17, 29, 18]. There are other reductions rules that are fairly simplistic and focus on very narrow cases [30]. We now explain the Edwards-Erdős bound and the spanning tree bound.

Edwards-Erdős Bound.

For a connected graph, the Edwards-Erdős bound [15, 16] is defined as $EE(G)=\frac{|E(G)|}{2}+\frac{|V(G)|-1}{4}$ . A linear-time algorithm that computes a cut satisfying the Edwards-Erdős bound for any given graph is provided by Van Ngoc and Tuza [34]. The Max Cut Above Edwards-Erdős (Max Cut AEE) problem asks for a graph $G$ and integer $k\in\mathbb{N}_{0}$ if $G$ admits a cut of size $EE(G)+k$ . All kernelization rules for Max Cut AEE require a set $S\subseteq V$ set such that $G-S$ is a clique forest. Etscheid and Mnich [17] propose an algorithm that computes such a set $S$ of at most $3k$ vertices in time $O(k\cdot(|V|+|E|))$ .

Spanning Tree Bound. Another approach is based on utilizing the spanning forest of a graph [29]. For a given $k\in\mathbb{N}_{0}$ , a Max Cut of size $|V|-1+k$ is searched for. This decision problem is denoted as Max Cut AST (Max Cut Above Spanning Tree). For sparse graphs, this bound is larger than the Edwards-Erdős bound. The reductions for the problem require a set $S\subset V(G)$ such that $G-S$ is a clique-cycle forest.

3 New Data Reduction Rules

We now introduce our new data reduction rules and prove their correctness. The main feature of our new rules is that they do not depend on the computation of a clique-forest to determine if they can be applied. Furthermore, our new rules subsume almost all rules from previous works [10, 11, 17, 29, 18] with the exception of Reduction Rules 10 and 11 by Crowston et al. [10]. We provide details in [19]. For an overview of how rules are subsumed, consult Table 1. Hence, our algorithm will only apply the rules proposed in this section. We provide proofs for the rules that proved most useful in our experimental evaluation.

**Reduction Rule 1. ** Let $G=(V,E)$ be a graph and let $S\subseteq V$ induce a clique in $G$ . If $|C_{\textnormal{ext}(G)}(S)|\leq\left\lceil|S|/2\right\rceil$ , then $\beta(G)=\beta(G^{\prime})+\beta(K_{|S|})$ for $G^{\prime}=(V\setminus C_{\textnormal{int}(G)}(S),E\setminus E(G[S]))$ .

Proof.

Note that any partition of the clique $G[S]$ into two vertex sets of size $\lceil|S|/2\rceil$ and $\lfloor|S|/2\rfloor$ is a maximum cut of $G[S]$ . Suppose we fix the partitions of the at most $\lceil|S|/2\rceil$ external vertices of $S$ . Then the at least $\lfloor|S|/2\rfloor$ internal vertices can be assigned to the partitions so they each contain $\lceil|S|/2\rceil$ and $\lfloor|S|/2\rfloor$ vertices. Thus, regardless of how $C_{\textnormal{ext}(G)}(S)$ is partitioned, the size of a maximum cut of $G[S]$ remains the same. ∎

We can exhaustively apply Reduction Rule 1 in $O(|V|\cdot\Delta^{2})$ time by scanning over all vertices in the graph. When scanning vertex $v$ , we check whether $N(v)\cup\{v\}$ induces a clique. This finds all cliques with at least one internal vertex. Checking whether Reduction Rule 1 is applicable is then straightforward by counting the number of vertices with degree higher than the size of the clique.

**Reduction Rule 2. ** Let $(a^{\prime},a,b,b^{\prime})$ be an induced $3$ -path in a graph $G$ with $N(a)=\{a^{\prime},b\}$ and $N(b)=\{a,b^{\prime}\}$ . Construct $G^{\prime}$ from $G$ by adding a new edge $\{a^{\prime},b^{\prime}\}$ and removing the vertices $a$ and $b$ . Then $\beta(G)=\beta(G^{\prime})+2$ .

Proof.

Let $S=\{a^{\prime},a,b,b^{\prime}\}$ and let $\delta:V\rightarrow\{0,1\}$ be an assignment of vertices to the partitions of a cut in $G$ . We distinguish two cases:

•

Case $\delta(a^{\prime})=\delta(b^{\prime})$ : If $\delta(a)=\delta(b)=\delta(a^{\prime})$ , then no edges of $G[S]$ are cut. Notice that this cut is not maximum since moving $b$ between partitions increases the cut size by two. If $\delta(a)\neq\delta(b)$ , then exactly two edges in $G[S]$ are cut.

•

Case $\delta(a^{\prime})\neq\delta(b^{\prime})$ : By choosing $\delta(a)=\delta(b^{\prime})$ and $\delta(b)=\delta(a^{\prime})$ , all three edges in $G[S]$ are cut. In $G^{\prime}$ , the edge between $a^{\prime}$ and $b^{\prime}$ is cut, so $\beta(G)=\beta(G^{\prime})+2$ . ∎

**Reduction Rule 3. ** Let $G$ be a graph and let $S\subseteq V(G)$ induce a near-clique in $G$ . Let $G^{\prime}$ be the graph obtained from $G$ by adding the missing edge $e^{\prime}$ so that $S$ induces a clique in $G^{\prime}$ . If $|S|$ is odd or $|C_{\textnormal{int}(G)}(S)|>2$ , then $\beta(G)=\beta(G^{\prime})$ .

Proof.

Let $(u,v)$ be the edge added to the graph and $\delta$ any 2-coloring of $C_{\textnormal{ext}(G)}(S)$ . We show that a maximum cut of $G^{\prime}$ exists such that $u$ and $v$ are in the same partition. As $G$ has one less edge than $G^{\prime}$ , this means that $\beta_{\delta}(G[S])=\beta_{\delta}(G^{\prime}[S])$ , which implies that $\beta(G)=\beta(G^{\prime})$ .

Define $V_{c}=\{x\in C_{\textnormal{ext}(G^{\prime})}(S)\mid\delta(x)=c\}$ for $c\in\{0,1\}$ . Without loss of generality, assume $|V_{0}|\leq|V_{1}|$ . Note that, given the partition for $C_{\textnormal{ext}(G^{\prime})}(S)$ , maximizing the cut of $S$ means minimizing $||V_{0}|-|V_{1}||$ . We distinguish three cases:

•

$|V_{0}|-|V_{1}|\leq 2$ : By adding $u$ and $v$ to $V_{0}$ , $||V_{0}|-|V_{1}||$ decreases. The rest of the internal vertices have to be distributed among $V_{0}$ and $V_{1}$ such that $||V_{0}|-|V_{1}||$ is minimized

•

$|V_{0}|-|V_{1}|=1$ : By adding $u$ and $v$ to $V_{0}$ , $||V_{0}|-|V_{1}||$ stays $1$ . If $|S|$ is odd, then $1$ is the minimal value possible and $|C_{\textnormal{int}(G)}(S)|$ is even. So the remaining internal vertices can be distributed evenly between $V_{0}$ and $V_{1}$ . If $S$ is even, then an odd number of internal vertices are left (and at least one by the definition of the rule) which can be distributed to balance $V_{0}$ and $V_{1}$ .

•

$|V_{0}|=|V_{1}|$ : By adding $u$ and $v$ to $V_{0}$ , $||V_{0}|-|V_{1}||$ becomes 2. If $|S|$ is odd, then an odd number of internal vertices is left to assign to such that $||V_{0}|-|V_{1}||$ becomes 1. If $|S|$ is even then there is an even number of internal vertices left which can be distributed to balance $V_{0}$ and $V_{1}$ .∎

Since some cliques are irreducible by currently known rules, it may be beneficial to also apply Reduction Rule 1 ‘in reverse’. Although this ‘reverse’ reduction neither reduces the vertex set nor (as our experiments suggest) lead to applications of other rules, it can undo unfruitful additions of edges made by Reduction Rule 1 and may remove other edges from the graph.

**Reduction Rule 4. ** Let $G$ be a graph and let $S\subseteq V(G)$ induce a clique in $G$ . If $|S|$ is odd or $C_{\textnormal{int}(G)}(S)>2$ , an edge between two vertices of $C_{\textnormal{int}(G)}(S)$ is removable. That is, $\beta(G)=\beta(G^{\prime})$ for $G^{\prime}=(V,E\setminus\{e\})$ , $e\in E(G[C_{\textnormal{int}(G)}(S)])$ .

Proof.

Follows from the correctness of Reduction Rule 1. ∎

The following reduction rule is closely related to the upcoming generalization of Reduction Rule 8 by Crowston et al. [10]. It is able to further reduce the case where $|X|=|N(X)|$ for a clique $X$ of $G$ . In comparison, the generalization of Reduction Rule 8 from [10] is able to handle the case $|X|>|N(X)|$ . Due to the degree by which these rules are similar, they are also merged together in our implementation, as the techniques to handle both are the same.

**Reduction Rule 5. ** Let $X\subseteq V$ induce a clique in a graph $G$ , where $|X|=|N(X)|\geq 1$ and $N(X)=N(x)\setminus X$ for all $x\in X$ . Create $G^{\prime}$ from $G$ by removing an arbitrary vertex of $X$ . Then $\beta(G)=\beta(G^{\prime})+|X|$ .

Proof.

Let $S:=X\cup N_{G}(X)$ and $\delta$ be any 2-coloring of $N_{G}(X)$ . Note that $C_{\textnormal{ext}(G)}(S)\subseteq N_{G}(X)$ – the removal of $N_{G}(X)$ disconnects $X$ from the remainder of the graph.

Define $V_{c}=\{x\in N_{G}(X)\mid\delta(x)=c\}$ and $z_{c}:=|V_{c}|$ for $c\in\{0,1\}$ . We distribute the vertices in $X$ among $V_{0}$ and $V_{1}$ such that $E(V_{0},V_{1})$ is maximized. Notice that every vertex in $X$ is connected to all other vertices in $S$ . The size of any cut is therefore $p(c_{0},c_{1})=c_{0}z_{1}+c_{1}z_{0}+c_{0}c_{1}+|E(V_{0},V_{1})|$ , where $c_{0}$ and $c_{1}$ denote the number of vertices from $X$ that we want to insert into $V_{0}$ and $V_{1}$ , respectively. This can be rewritten as $p(c_{0},c_{1})=(z_{0}+c_{0})\cdot(z_{1}+c_{1})-z_{0}z_{1}+|E(V_{0},V_{1})|$ . As all other parts are constant, this reduces to maximizing $(z_{0}+c_{0})\cdot(z_{1}+c_{1})$ . As $z_{0}+c_{0}+z_{1}+c_{1}$ is constant, $(z_{0}+c_{0})\cdot(z_{1}+c_{1})$ is maximized when $|(z_{0}+c_{0})-(z_{1}+c_{1})|$ is minimized.

Because $|X|=|N_{G}(X)|$ , it is always possible to distribute the vertices of $X$ such that $z_{0}+c_{0}=z_{1}+c_{1}=|X|$ , which then maximizes $p(c_{0},c_{1})$ . Removing any vertex $x\in X$ from $G$ will change the cut by $-|X|$ : without loss of generality, let $x\in V_{0}$ . Then $|X|+|N_{G}(X)|$ is odd and $|z_{0}+(c_{0}-1)-z_{1}+c_{1}|=1$ , which maximizes the cut. Then, $p(c_{0}-1,c_{1})=p(c_{0},c_{1})-|X|$ . ∎

The following algorithm identifies all candidates of Reduction Rule 1 in linear time. First, we order the adjacencies of all vertices. That is, for every vertex $v\in V$ , the vertices in $N(v)$ are sorted according to a numeric identifier assigned to every vertex. For this, we create an auxiliary array of empty lists of size $|V(G)|$ . We then traverse the vertices $w\in N(v)$ for every vertex $v\in V(G)$ and insert each pair $(v,w)$ in a list identified by indexing the auxiliary array with $w$ . We then iterate once over the array from the lowest identifier to the highest and recreate the graph with sorted adjacencies. In total, this process takes $O(|V|+|E|)$ time.

For any clique $X$ of $G$ , we have to check if for all pairs $(x_{1},x_{2})$ of vertices from $X$ that $N(x_{1})\cup\{x_{1}\}=N(x_{2})\cup\{x_{2}\}$ holds (neighborhood condition). Our algorithm uses tries [20, 12] to find all candidates. A trie supports two operations, Insert(key,val) and Retrieve(key). The key parameter is an array of integers and val is a single integer. Function Retrieve returns all inserted values by Insert that have the same key. Internally, a trie stores the inserted elements as a tree, where every node corresponds to one integer of the key and every prefix is stored only once. That means that two keys sharing a prefix share the same path through the trie until the position where they differ.

For each vertex $v\in V$ , we use the ordered set $N(v)\cup\{v\}$ as key and $v$ as the val parameter. Notice that $N(v)$ is already sorted. The key $N(v)\cup\{v\}$ can be then computed through an insertion of $v$ into the sequence $N(v)$ in time $O(|N(v)|)$ . After Insert( $N(v)\cup\{v\}$ , $v$ ) is done for every vertex $v\in V$ , each trie leaf contains all vertices that satisfy the condition of Reduction Rule 1. Meaning, for every vertex pair $(x_{1},x_{2})$ of a trie leaf, the neighborhood condition is met. We then verify whether the vertex set $X$ of a leaf is a clique, in $O(|E(X)|)$ time. As each such set $X$ is considered exactly once and the graph is fully partitioned, this requires $O(|V|+|E|)$ time in total. As a last step, we check whether $|X|>\max\{|N(X)|,1\}$ by using the observation that $\forall x\in X:|N(X)|=\mathsf{deg}(x)-|X|$ . In Sect. 4, we describe a timestamping system that assists the above procedure in not having to repeatedly check the same structures after any amount of vertices and edges are added or removed from $G$ . However, in those later applicability checks, we disregard sorting the adjacencies of all vertices in linear time again. Rather we simply use a comparison based sort on the adjacencies.

The next reduction rule is our only rule whose application turns unweighted instances into instances of Weighted Max Cut. Our experiments show that this can reduce the kernel size significantly. This is noteworthy, given that existing solvers for Max Cut usually support weighted instances.

**Reduction Rule 6. ** Let $G$ be a graph, $w:E\rightarrow\mathbb{Z}$ a weight function, and $(a,b,a^{\prime})$ be an induced 2-path with $N(b)=\{a,a^{\prime}\}$ . Let $e_{1}$ be the edge between vertex $a$ and $b$ ; let $e_{2}$ be the one between $b$ and $a^{\prime}$ . Construct $G^{\prime}$ from $G$ by deleting vertex $b$ and adding a new edge $\{a,a^{\prime}\}$ with $w^{\prime}(\{a,a^{\prime}\})=\max\{w(e_{1}),w(e_{2})\}-\max\{0,w(e_{1})+w(e_{2})\}$ . Then $\beta(G,w)=\beta(G^{\prime},w)+\max\{0,w(e_{1})+w(e_{2})\}$ .

Proof.

Let $\delta$ be a maximum cut of $G$ and consider the following two cases:

•

$\delta(a)=\delta(a^{\prime})$ : If $w(e_{1})+w(e_{2})>0$ , then $\delta(b)\neq\delta(a)$ . Otherwise, $\delta(b)=\delta(a)$ . In total, the path contributes $\max\{0,w(e_{1})+w(e_{2})\}$ to the cut. in $G^{\prime}$ , the edge between $a$ and $a^{\prime}$ is not cut, so $\beta(G,w)=\beta(G^{\prime},w^{\prime})+\max\{0,w(e_{1})+w(e_{2})\}$ .

•

$\delta(a)\neq\delta(a^{\prime})$ : If $w(e_{1})>w(e_{2})$ , then $\delta(b)=\delta(a^{\prime})$ . Otherwise, $\delta(b)=\delta(a)$ . In total, the path contributes $\max\{w(e_{1}),w(e_{2})\}$ to the cut. In $G^{\prime}$ , the edge between $a$ and $a^{\prime}$ is cut and contributes $w^{\prime}(\{a,a^{\prime}\})=\max\{w(e_{1}),w(e_{2})\}-\max\{0,w(e_{1})+w(e_{2})\}$ to the cut, so again $\beta(G,w)=\beta(G^{\prime},w^{\prime})+\max\{0,w(e_{1})+w(e_{2})\}$ .∎

Our next two rules (Reduction Rules 1 and 1) generalize Reduction Rule 8 by Crowston et al. [10], which we restate for completeness.

Reduction Rule 8. ([10], Reduction Rule 8)

Let $(G,l)$ be a signed graph, $S\subseteq V$ a set of vertices such that $G[V\setminus S]$ is a clique forest, and $C$ a block in $G[V\setminus S]$ . If there is a $X\subseteq C_{\textnormal{int}(G[V\setminus S])}(C)$ such that $|X|>\frac{|C|+|N(X)\cap S|}{2}\geq 1$ , $N_{l}^{+}(x)\cap S=N_{l}^{+}(X)\cap S$ and $N_{l}^{-}(x)\cap S=N_{l}^{-}(X)\cap S$ for all $x\in X$ . Construct the graph $G^{\prime}$ from $G$ by removing any two vertices $x_{1},x_{2}\in X$ , then $\beta(G^{\prime})-EE(G^{\prime})=\beta(G)-EE(G)$ .

Note that, for unsigned graphs, $N_{l}^{+}(x)=\emptyset$ and $N_{l}^{-}(x)=N(x)$ for every vertex $x$ .

Here, different choices of $S$ lead to different applications of this rule. Our generalizations do not require such a set anymore and can find all possible applications for any choice of $S$ .

Reduction Rule 1w=1.

Let $X$ be the vertex set of a clique in $G$ with $|X|>\max\{|N(X)|,1\}$ and $N(X)=N(x)\setminus X$ for all $x\in X$ . Construct the graph $G^{\prime}$ by deleting two arbitrary vertices $x_{1},x_{2}\in X$ from $G$ . Then $\beta(G)=\beta(G^{\prime})+|N(x_{1})|$ .

We show the correctness of Reduction Rule 1 by reducing it to Reduction Rule 8 by Crowston et al. [10].

Proof.

Let $S=V\setminus X$ and $C=X$ . Since $X$ is a clique, $G[V\setminus S]$ is a clique forest. From $|X|>\max\{|N(X)|,1\}$ it follows that $|X|>\frac{|X|+|N(X)|}{2}=\frac{|C|+|N(X)\cap S|}{2}\geq 1$ . Also, $N(x)\setminus X=N(x)\cap S$ and $N(X)\cap S=N(X)$ , so all conditions for Reduction Rule 1 are satisfied.

It remains to show that $\beta(G)=\beta(G^{\prime})+|N(x_{1})|$ . Note that $|E(G^{\prime})|=|E(G)|-|N_{G}(x_{1})|-(|N_{G}(x_{2})|-1)$ and $|V(G^{\prime})|=|V(G)|-2$ . By Reduction Rule 1, we know that $\beta(G^{\prime})-EE(G^{\prime})=\beta(G)-EE(G)$ , therefore we have that

[TABLE]

Where (1) follows from $N_{G}(x_{1})=N_{G}(x_{2})$ . ∎

**Reduction Rule 7. ** Let $X\subseteq V$ induce a clique in a signed graph $(G,l)$ such that $\forall e\in E(X):l(e)=``\scalebox{0.75}[1.0]{$ - $}"$ and $|X|>\max\{|N(X)|,1\}$ , $N_{l}^{+}(X)=N_{l}^{+}(x)\setminus X$ , and $N_{l}^{-}(X)=N_{l}^{-}(x)\setminus X$ for all $x\in X$ . Construct $G^{\prime}$ by deleting two arbitrary vertices $x_{1},x_{2}\in X$ from $G$ . Then $\beta(G)=\beta(G^{\prime})+|N(x_{1})|$ .

Proof (Sketch)..

The proof for this rule is almost identical to the proof of Reduction Rule 1. ∎

Using an almost equivalent approach as we did for Reduction Rule 1, we can find all candidates of this reduction rule in linear time.

In order to also reduce weighted instances to some degree, we use a simple weighted scaling of two reduction rules. That is, we extend their applicability from an unweighted subgraph to a subgraph where all edges have the same weight $c\in\mathbb{R}$ . We do this for Reduction Rules 1 and 1.

Reduction Rule 1w=c.

Let $(G,\omega)$ be a weighted graph and let $S\subseteq V(G)$ induce a clique with $\omega(e)=c$ for every edge $e\in E(G[S])$ for some constant $c\in\mathbb{R}$ . Let $G^{\prime}=(V(G)\setminus C_{\textnormal{int}(G)}(S),E(G)\setminus E(G[S]))$ with $\omega^{\prime}(e)=\omega(e)$ for every $e\in E(G^{\prime})$ . If $|C_{\textnormal{ext}(G)}(S)|\leq\left\lceil\frac{|S|}{2}\right\rceil$ , then $\beta(G,\omega)=\beta(G^{\prime},\omega^{\prime})+c\cdot\beta(K_{|S|})$ .

Reduction Rule 1w=c.

Let $(G,\omega)$ be a weighted graph and let $S\subseteq V(G)$ induce a near-clique in $G$ . Furthermore, let $\omega(e)=c$ for every edge $e\in E(G[S])$ for some constant $c\in\mathbb{R}$ . Let $G^{\prime}$ be the graph obtained from $G$ by adding the edge $e^{\prime}$ so that $S$ induces a clique in $G^{\prime}$ . Set $\omega^{\prime}(e^{\prime})=c$ , and $\omega^{\prime}(e)=\omega(e)$ for $e\in E(G)$ . If $|S|$ is odd or $|C_{\textnormal{int}(G)}(S)|>2$ , then $\beta(G,\omega)=\beta(G^{\prime},\omega^{\prime})$ .

4 Implementation

4.1 Kernelization Framework

We now discuss our overall kernelization framework in detail. Our algorithm begins by generating an unweighted instance by replacing every weighted edge by an unweighted subgraph with a specific structure. Afterwards, we apply our full set of unweighted reduction rules: 1, 1 (together with 1), 1, and 1. As already mentioned earlier, Reduction Rule 1 is the unweighted version of 1. We then create a signed instance of the graph by exhaustively executing weighted path compression using Reduction Rule 1 with the restriction that the resulting weights are $-1$ or $+1$ . We then exhaustively apply Reduction Rule 1. Once the signed reductions are done, we apply Reduction Rule 1 to fully compress all paths into weighted edges. This is then succeeded by Reduction Rule 1 and 1. We then transform the instance into an unweighted one and apply Reduction Rule 1 in order to avoid cyclic interactions between itself and Reduction Rule 1. Finally, if a weighted solver is to be used on the kernel, we exhaustively perform Reduction Rule 1 to produce a weighted kernel. Note that different permutations of the order in which reduction rules are applied can lead to different results.

4.2 Timestamping

Next we describe how to avoid unnecessary checks for the applicability of reduction rules. For this purpose, let the time of the most recent change in the neighborhood of a vertex be $T:V(G)\rightarrow\mathbb{N}_{0}$ and let the variable $t\in\mathbb{N}$ describe the current time. Initially, $T(v)=0,\forall v\in V$ and $t=1$ . Every time a reduction rule performs a change on $N(v)$ , set $T(v)=t$ and increment $t$ . For each individual Reduction Rule $r$ , we also maintain a timestamp $t_{r}\in\mathbb{N}_{0}$ (initialized with [math]), indicating the upper bound up to which all vertices have already been processes. Hence, all vertices $v\in V$ with $T(v)\leq t_{r}$ do not need to be checked again by Reduction Rule $r$ . Note that timestamping only works for “local” reduction rules—the rules whose applicability can be determined by investigating the neighborhood of a vertex. Therefore, we only use this technique for Reduction Rules 1 and 1.

5 Experimental Evaluation

5.1 Methodology and Setup

All of our experiments were run on a machine with four Octa-Core Intel Xeon E5-4640 processors running at 2.40GHz CPUs with $512$ GB of main memory. The machine runs Ubuntu 18.04. All algorithms were implemented in C++ and compiled using gcc version 7.3.0 with optimization flag -O3. We use the following state-of-the-art Weighted Max Cut solvers for comparisons: the exact solvers LocalSolver [8] (heuristically finds a large cut, and can then verify if it is maximum), Biq Mac [31] as well as the heuristic solver MqLib [14]. MqLib is unable to determine on its own when it reaches a maximum cut and always exhausts the given time limit. We also evaluated an implementation of the reduction rules used by Etscheid and Mnich [17]; however, preliminary experiments indicated that it performs worse than current state-of-the-art solvers. In the following, for a graph $G=(V,E)$ , $G_{\textnormal{ker}}$ denotes the graph after all reductions have been applied exhaustively. For this purpose, we examine the following efficiency metric: we denote the kernelization efficiency by $e(G)=1-|V(G_{\textnormal{ker}})|/{|V(G)|}$ . Note that $e(G)$ is $1$ when all vertices are removed after applying all reduction rules, and [math] if no vertices are removed.

For our experiments we use four different datasets: First, we use random instances from four different graph models that were generated using the KaGen graph generator [21, 33]. In particular, we used Erdős-Rényi graphs (GNM), random geometric graphs (RGG2D), random hyperbolic graphs (RHG) and Barabási-Albert graphs (BA). The main purpose of these instances is to study the effectiveness of individual reduction rules for a variety of graph densities and degree distributions. To analyze the practical impact of our algorithm on current-state-of-the-art solvers we use a selection of sparse real-world instances by Rossi and Ahmed [32], as well as instances from VLSI design (g00*) and image segmentation (imgseg-*) by Dunning et al. [14]. Note that the original instances by Dunning et al. [14] use floating-point weights that we scaled to integer weights. Finally, we evaluate denser instances taken from the rudy category of the Biq Mac Library [1]. We further subdivide these instances into medium- and large-sized instances.

5.2 Performance of Individual Rules

To analyze the impact of each individual reduction rule, we measure the size of the kernel our algorithm procedures before and after their removal. Fig. 1 shows our results on RGG2D and GNM graphs with $2048$ vertices and varying density. We have settled on those two types of graphs as they represent different ends on the spectrum of kernelization efficiency. In particular, kernelization performs good on instances that are sparse and have a non-uniform degree distribution. Such properties are given by the random geometric graph model used for generating the RGG2D instances. Likewise, kernelization performs poor on the uniform random graphs that make up the GNM instances. We excluded Reduction Rule 1 from these experiments as it only removes edges and thus leads to now difference in the kernelization efficiency.

Looking at Fig. 1, we can see that Reduction Rule 1 gives the most significant reduction in size. Its absence always diminishes the result more than any other rule. In particular, we see a difference in efficiency of up to $0.47$ (RGG2D) and $0.41$ (GNM) when removing Reduction Rule 1. The second most impactful rule for the RGG2D instances is Reduction Rule 1 with a difference of only up to $0.04$ . For the GNM instances Reduction Rule 1 is second with a difference of up to $0.17$ . However, note that Reduction Rules 1 and 1 lead to no difference in efficiency on these instances. Thus, we can conclude that depending on the graph type, different reduction rules have varying importance. Furthermore, our simple Reduction Rule 1 seems to have the most significant impact on the overall kernelization efficiency. Note that this is in line with the theoretical results from Table 1, which states that Reduction Rule 1 covers most of the previously published reduction rules and Reduction Rule 1 still covers many but less rules from previous work.

5.3 Exactly Computing a Maximum Cut

To examine the improvements kernelization brings for medium-sized instances, we compare the time required to obtain a maximum cut for both the kernelized and the original instance. We performed these experiments using both LocalSolver and Biq Mac. Note that we did not use MqLib as it is not able to verify the optimality of the cut it computes. The results of our experiments for our set of real-world instances are given in Table 2 (with weighted path compression) and Table 3 (without weighted path compression). Since the image segmentation instances are already weighted, they are omitted from Table 3. It is noteworthy that we do not include the results for the rudy instances from the Biq Mac library. These instances feature a uniform edge distribution and an overall average degree of at least $3.5$ . Our preliminary experiments indicated that kernelization provides little to no reduction in size for these instances. Therefore, we omit them from further evaluation and focus on more sparse graphs.

First, we notice that kernelization is able to provide moderate to significant speedups for all instances that we have tested. In particular, we are able to a speedup between $1.04$ and $228.91$ for instances that were previously solvable by LocalSolver. Likewise, for the instances that Biq Mac is able to process, we achieve a speedup of up to three orders of magnitude. Furthermore, we allow these solvers to now compute a maximum cut for a majority of instances that have previously been infeasible in less than $17$ minutes.

To examine the impact when allowing a weighted kernel, we now compare the performance our algorithm using weighted path compression (Table 2) with the unweighted version (Table 3). We can see that by including weighted path compression we can achieve significantly better speedups, especially for the sparse real-world instances by Rossi and Ahmed [32]. For example, on ego-facebook we achieve a speedup of $228.91$ with compression and $11.83$ without.

Finally, it is also noteworthy that we get significant improvements for the weighted instances from VLSI design and image segmentation. By examining the performance of each individual reduction rule, we can see that this is solely due to Reduction Rule 1. These findings could improve the work by de Sousa et al. [13], which also affects the work by Dunning et al. [14]. In conclusion, our novel reduction rules give us a simple but powerful tool for speeding up existing state-of-the-art solvers for computing maximum cuts. Moreover, as mentioned previously, even our simple weighted path compression by itself is able to have a significant impact.

5.4 Analysis on Large Instances

We now examine the performance of our kernelization framework and its impact on existing solvers for large graph instances with up to millions of vertices. For this purpose, we compared the cut size over time achieved by LocalSolver and MqLib with and without our kernelization. Note that we did not use Biq Mac as it was not able to handle instances with more than 3 000 vertices. Our results using a three-hour time limit for each solver are given in Table 4. Furthermore, we present convergence plots in Fig. 2.

First, we note that the time to compute the actual kernel is relatively small. In particular, we are able to compute a kernel for a graph with $14$ million vertices and edges in just over six minutes. Furthermore, we achieve an efficiency between $0.18$ and $0.91$ across all tested instances. When looking at the convergence plots (Fig. 2) we can observe that the additional preprocessing time of kernelization is quickly compensated by a significantly steeper increase in cut size compared to the unkernelized version. Furthermore, for instances where a kernel can be computed very quickly, such as web-google, we find a better solution almost instantaneously. In general, the results achieved by kernelization followed by the local search heuristic are always better than just using the local search heuristic alone. However, the final improvement on the size of the largest cut found by LocalSolver and MqLib is generally small for the given time limit of three hours.

6 Conclusions

We engineered new efficient data reduction rules for Max Cut and showed that these rules subsume most existing rules. Our extensive experiments show that kernelization has a significant impact in practice. In particular, our experiments reveal that current state-of-the-art solvers can be sped up by up to multiple orders of magnitude when combined with our data reduction rules.

Developing new reduction rules is an important direction for future research. Of particular interest are reduction rules for Weighted Max Cut, where reduction rules yield a weighted kernel.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Biq Mac Library. http://biqmac.aau.at/biqmaclib.html , 2018. [Online; accessed 2-September-2018].
2[2] Faisal N. Abu-Khzam, Michael R. Fellows, Michael A. Langston, and W. Henry Suters. Crown structures for vertex cover kernelization. Theory Comput. Syst. , 41(3):411–430, 2007. doi:10.1007/s 00224-007-1328-0 . · doi ↗
3[3] Emely Arráiz and Oswaldo Olivo. Competitive simulated annealing and tabu search algorithms for the Max-Cut problem. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation , GECCO ’09, pages 1797–1798, New York, NY, USA, 2009. ACM. doi:10.1145/1569901.1570167 . · doi ↗
4[4] Francisco Barahona. On the computational complexity of Ising spin glass models. J. Phys. A: Mathematical and General , 15(10):3241, 1982. doi:10.1088/0305-4470/15/10/028 . · doi ↗
5[5] Francisco Barahona. Network design using cut inequalities. SIAM J. Optim. , 6(3):823–837, 1996. doi:10.1137/S 1052623494279134 . · doi ↗
6[6] Francisco Barahona, Martin Grötschel, Michael Jünger, and Gerhard Reinelt. An application of combinatorial optimization to statistical physics and circuit layout design. Oper. Res. , 36(3):493–513, 1988. doi:10.1287/opre.36.3.493 . · doi ↗
7[7] Una Benlic and Jin-Kao Hao. Breakout local search for the Max-Cut problem. Engineering Applications of Artificial Intelligence , 26(3):1162–1173, 2013. doi:10.1016/j.engappai.2012.09.001 . · doi ↗
8[8] Thierry Benoist, Bertrand Estellon, Frédéric Gardi, Romain Megel, and Karim Nouioua. Localsolver 1.x: a black-box local-search solver for 0-1 programming. 4OR , 9(3):299, 2011. [used in this work: Localsolver 8.0]. URL: https://www.localsolver.com/ , doi:10.1007/s 10288-011-0165-9 . · doi ↗