Distance-preserving graph contractions
Aaron Bernstein, Karl D\"aubel, Yann Disser, Max Klimm, Torsten, M\"utze, Frieder Smolny

TL;DR
This paper introduces a new framework for graph contraction that preserves pairwise distances within a specified tolerance, providing algorithms for trees and complexity results for other graph classes.
Contribution
It formalizes the graph contraction problem with distance preservation, analyzes its complexity, and offers algorithms for specific graph classes and approximate solutions.
Findings
Polynomial-time algorithms for trees
Hardness results for certain graph classes
Efficient algorithms for approximate contractions
Abstract
Compression and sparsification algorithms are frequently applied in a preprocessing step before analyzing or optimizing large networks/graphs. In this paper we propose and study a new framework contracting edges of a graph (merging vertices into super-vertices) with the goal of preserving pairwise distances as accurately as possible. Formally, given an edge-weighted graph, the contraction should guarantee that for any two vertices at distance , the corresponding super-vertices remain at distance at least in the contracted graph, where is a tolerance function bounding the permitted distance distortion. We present a comprehensive picture of the algorithmic complexity of the contraction problem for affine tolerance functions , where and are arbitrary real-valued parameters. Specifically, we presentâŠ
| Problem | Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â âGraph classesâ | |||
|---|---|---|---|---|
| Path | Tree | Cycle | General | |
| Contraction | ||||
| addit. (), unit lg. |  [Th. 4] | -inapx.666even for bipartite graphs and [Th. 10] | ||
| affine (), unit lg. | [Th. 2] |  [Th. 3] | ||
| addit. () | NP-hard [Th. 7] | -inapx. [Th. 9] | ||
| affine () |  [Th. 5] | |||
| Weak Contraction | ||||
| additive () | NP-hard777also NP-hard for planar graphs with arb. large girth, , and unit lg. () [Th. 11]. [Th. 7] | |||
| affine () |  [Th. 6] | -inapx.888even if . [Th. 12] | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Distance-preserving graph contractions
***An extended abstract of this work has appeared in the Proceedings of the 9th Innovations in Theoretical Computer Science Conference (ITCS) 2018 [BDD*+*18].
Aaron Bernstein1â â â E-Mail: [email protected], Karl DĂ€ubel1âĄâĄâĄE-Mail: {daeubel,muetze,smolny}@math.tu-berlin.de, Yann Disser2§§§Supported by the âExcellence Initiativeâ of the German Federal and State Governments and the Graduate School CE at TU Darmstadt. E-Mail: [email protected],
Max Klimm3¶¶¶E-Mail: [email protected], Torsten MĂŒtze1⥠and Frieder Smolny1âĄ
1Institut fĂŒr Mathematik, TU Berlin
2Department of Mathematics, Graduate School CE, TU Darmstadt
3Wirtschaftswissenschaftliche FakultÀt, HU Berlin
Abstract.
Compression and sparsification algorithms are frequently applied in a preprocessing step before analyzing or optimizing large networks/graphs. In this paper we propose and study a new framework contracting edges of a graph (merging vertices into super-vertices) with the goal of preserving pairwise distances as accurately as possible. Formally, given an edge-weighted graph, the contraction should guarantee that for any two vertices at distance , the corresponding super-vertices remain at distance at least in the contracted graph, where is a tolerance function bounding the permitted distance distortion. We present a comprehensive picture of the algorithmic complexity of the contraction problem for affine tolerance functions , where and are arbitrary real-valued parameters. Specifically, we present polynomial-time algorithms for trees as well as hardness and inapproximability results for different graph classes, precisely separating easy and hard cases. Further we analyze the asymptotic behavior of contractions, and find efficient algorithms to compute (non-optimal) contractions despite our hardness results.
1. Introduction
When dealing with large networks, it is often beneficial to compress or sparsify the data to manageable size before analyzing or optimizing the network directly. To be useful, a meaningful compression should represent salient features of the original network with good approximation, while being much smaller in size. In this paper, we focus on a compression of undirected edge-weighted graphs that approximately maintains all distances between vertices in the graph.
In this context, an extensively studied concept are spanners (e.g. [PS89, ADD*+*93, BKMP05, AB16]). Given an undirected graph and real numbers and , a subgraph , , is an -spanner of if holds for all . While the number of edges in a spanner may be much smaller than that of the original graph, the number of vertices is the same for both, leaving further potential for compression untapped. For illustration, consider the road network of Europe with about 50 million vertices [BMSW13], any spanner of which must again have about 50 million vertices and edges. However, to approximately represent distances in Europeâs road network one may also merge nearby vertices into super-vertices, thus achieving a much better compression of the network. This is akin to the visual process of zooming out of a graphical representation of the map, where neighbored vertices fade into each other and edges between merged vertices vanish. At a large enough zoom level, the entire network merges into a single vertex.
In this paper we propose and study a new framework for contracting networks that formalizes this intuitive idea and makes it applicable to general graphs. Specifically, we study a contraction problem on graphs where a subset of edges is contracted. We denote by the resulting simple graph obtained from by contracting the edges in and by deleting resulting loops and multiple edges, keeping only the minimum length edge between any two vertices. For any two vertices in , we compare their distance in with the distance of the corresponding super-vertices in .
It is interesting to contrast this concept with graph spanners. When constructing a spanner, the length of the removed edges is implicitly set to , resulting in an overall increase of distances. On the other hand, a contraction implicitly sets the length of the contracted edges to zero, leading to an overall decrease of distances. For both problems, the ultimate goal is to reduce the complexity of the network while maintaining an approximation guarantee on the distances.
The following example shows that contractions may be better suited than spanners to achieve this goal. In a subgraph with small radius, a spanner can at best result in a spanning tree of the same order, while a contraction can reduce the whole subgraph to a single vertex, while entailing a multiplicative distance distortion of similar magnitude. In addition, the contraction may also merge many edges entering the contracted subgraph. Clearly, the objective here is to maximize the total number of contracted and deleted edges, as this minimizes the memory required to represent the resulting network in a computer (using e.g. adjacency lists).
Given the results presented in this paper and the known results for spanners (discussed in detail below), we further believe that the combination of spanners and contractions is very powerful, promising and flexible. As the former only increases and the latter only decreases the distances, the respective distortion guarantees provably also hold for the overall distortion. In fact, both effects may even compensate each other. This is true regardless of the order in which both compression operations are applied, even when they are applied repeatedly.
In order to measure the distance distortion of the contraction, we assume a non-decreasing tolerance function , similar to the corresponding function for spanners, see e.g. [BKMP05]. We are interested in computing contractions that preserve distances in the following sense: For any two vertices and at distance in , the distance of the corresponding vertices in the contracted graph must be at least . If this condition is satisfied, we call a -distance preserving contraction, or -contraction for short. Formally, the algorithmic problem Contraction considered in this paper is to compute for a given graph with edge lengths and a given tolerance function , a -contraction such that the number of contracted and deleted edges is maximized. We are specifically interested in the case where the tolerance function is an affine function for real-valued parameters and . We then simply write -contraction instead of -contraction. See Figure 1 for some example instances of the problem Contraction.
When considering the case of a purely multiplicative error (), a slight subtlety has to be taken into account. Specifically, for a graph with positive edge lengths it is not feasible to contract a single edge. Therefore, we propose a slight modification of our original model: We say that a set of edges of is a weak -distance preserving contraction, or weak -contraction for short, if it does not contract the entire graph and, for any two vertices and at distance in , the distance of the corresponding vertices in is either zero or at least . We will refer to the corresponding algorithmic problem as Weak Contraction. Put differently, in a weak contraction, the distances between different super-vertices satisfy the given distortion guarantee, but for vertices belonging to the same super-vertex, no guarantee is given.
1.1. Our results
In this paper, we present a comprehensive picture of the algorithmic complexity of the described contraction problems. Recall that we are given an input graph with edge lengths and tolerance function , and our goal is to compute a (weak) contraction that maximizes the total number of contracted and deleted edges. Our main results concern affine tolerance functions with parameters and . For the readerâs convenience, our results are summarized in Tables 1 and 2. Within the tables and throughout this paper, and denote the number of vertices and edges, respectively, of the input graph under consideration.
Algorithmic results
We develop linear time greedy algorithms for Contraction with unit lengths on paths and cycles for general and , as well as on trees with (Theorems 2, 3 and 4). The first two algorithms are inspired by LP rounding techniques, the latter algorithm relies on a structural characterization of optimal solutions.
We present dynamic programming algorithms solving Contraction and Weak Contraction on trees in time or , respectively (Theorems 5 and 6). These dynamic programs compute optimal solutions on subtrees, in the latter case combining several Pareto optimal solutions in a two-dimensional parameter space (hence the larger running time).
Note that instead of maximizing the number of contracted and deleted edges, we could optimize for or while fixing the other parameters. The resulting problems are polynomially equivalent to our setting, via binary search over one of the parameters.
Hardness results
We complement these algorithms by several hardness results. First we consider the purely additive case where . We show that here both Contraction and Weak Contraction are NP-hard on cycles for any fixed , by a reduction of a variant of Partition (Theorem 7). As mentioned before, both problems can be solved efficiently on graphs without cycles, and there is a linear time algorithm for Contraction on cycles with unit lengths. By reductions from Clique we show that both the general as well as the unit lengths case of Contraction with are hard to approximate within factors of or , respectively (Theorem 9 and Theorem 10).
Further we consider the purely multiplicative case where (here Contraction is trivial). We show that in this case Weak Contraction is NP-hard on planar graphs with arbitrarily large girth and unit length edges by a reduction from a special case of Planar 3SAT (Theorem 11). Since these graphs are locally tree-like, this result constitutes another rather sharp separation from the polynomially solvable tree case. Furthermore, we show that the problem is hard to approximate within a factor of by a reduction from Independent Set (Theorem 12).
Asymptotic bounds
We now discuss our asymptotic bounds for contractions. In this setting, we are interested in (non-optimal) contractions for graphs with unit lengths that can be computed efficiently despite the above-mentioned hardness results. We prove that for any any graph has a -contraction such that has at most edges, and such a contraction can be computed in time (Theorem 13) by successively growing clusters around center vertices. Assuming ErdĆsâ girth conjecture, we show a corresponding (not tight) lower bound (Theorem 15).
For a purely additive error, we observe two simple -contractions that can be computed in time (Theorem 16). We show that for any even integer , the edges incident to the vertices of highest degrees form a -contraction with objective value at least , which is asymptotically best possible for paths. Another -contraction is implicitly used by Bernstein and Chechik in their faster deterministic algorithm for dynamic shortest paths in dense graphs [BC16]. For any number , it consists of the edges incident to two vertices of degree at least , and has edges. Both of these contractions can be computed in time. Further we note that the main result in [AB16] implies that for all , any contraction such that has edges does not admit a constant additive error.
One possible advantage of contraction compared to spanners is the potentially significant reduction of vertices as well as edges, e.g. reducing the complexity of performing algorithmic tasks in the smaller graph. To ground this intuition, we exhibit a contraction that significantly reduces the number of vertices in any graph with minimum degree to (Theorem 17). We also present a lower bound (Theorem 18) showing that we cannot guarantee vertices, even if we allow larger approximation error.
1.2. Comparison with previous results
There are several models aiming to compress graphs while preserving distances. They differ by their choice of compression operation, such as replacing the graph by a subgraph or minor, and by whether the aim is to preserve all or only certain distances.
As discussed before, graph spanners are a concept closely related to contractions, where the length of removed edges is set to rather than to [math]. Our results highlight further intrinsic similarities of the two models. Like contractions, spanners are NP-hard to compute optimally (see [PS89, LS93]). While the spanner literature considers the problem of minimizing the number of remaining edges, we analyze the objective of maximizing the number of contracted edges, prohibiting a direct comparison of the respective inapproximability results. We note however that approximation algorithms for spanner problems have been studied extensively, even though strong lower bounds are known. For instance, computing -spanners in unweighted graphs is -hard to approximate ([KP94, Kor01]); for further references see e.g. [CDKL17].
Despite these negative results, it is still possible to obtain powerful asymptotic guarantees in both models. In particular, our -contraction with edges for unweighted graphs has a clear analogy to the classic -spanner with the same number of edges [ADD*+*93] (note that the additive error of 1 in our result is strictly necessary, as discussed above). There is, however, a major difference between the two results: whereas the -spanner can trivially be shown to be optimal assuming ErdĆsâ girth conjecture, applying this conjecture to the contraction model only yields a lower bound of edges for a -contraction. Closing this gap thus remains as an interesting open problem in the contraction model, whose solution would likely yield further insight into the relationship to spanners.
Halperin and Zwick showed how an optimal -spanner can be constructed in linear time (see [BS03]). We achieve the same running time for our -contraction. It is interesting to note that the clustering yielding our -contraction was previously used in [PS89] to obtain a -spanner of the same asymptotic density.
There are also spanner results that significantly sparsify unweighted graphs at the cost of a purely additive error, as a (1,2)-spanner with edges [ACIM99], or a (1,6)-spanner with edges [BKMP05]. We do not know if analogous results are possible in the contraction model. The incompressibility result in [AB16] mentioned above implies the same lower bound for spanners as for contractions and every other distance oracle with additive error: For every any spanner of size does not admit a constant additive error. Finally, for spanners there are results that combine multiplicative and additive error, such as the -spanner of [BKMP05].
Gupta [Gup01] considered the problem of approximating a tree metric on a subset of the vertices by another tree, and gave a linear time algorithm computing an -approximation. As Chan et al. [CXKR06] observed later, on complete binary trees a solution of minimum distortion is always achieved by a minor (with possibly different edge lengths) of the input tree, so this seems to be the first investigation of contractions that approximate graph distances. Krauthgamer et al. [KNZ14] considered an extension to general graphs, studying the size of minors preserving all distances between a given terminal set of fixed size. Cheung et al. [CGH16] introduced a multiplicative distortion to this model. As here no two terminals may be merged, these approaches cannot compress a graph at all if every vertex is a terminal.
The pairwise preservers due to Coppersmith et al. [CE06] combine spanners with the aim of preserving only terminal distances. Given a graph and a set of terminal pairs, a pairwise preserver is a spanning subgraph inducing exactly the same terminal distances as . Coppersmith et al. [CE06] proved that for every undirected weighted graph there exists a pairwise preserver of size . Furthermore, they showed that every directed weighted graph has a pairwise preserver of size . For the special case of undirected unweighted graphs, Bodwin et al. [BVW16] showed the existence of a pairwise preserver with edges. Recently, Bodwin [Bod17] proved that any directed weighted graph has a pairwise preserver of size .
1.3. Further related work
The preservation of graph properties other than distances has been studied as well. Biedl et al. [BBV00] considered contractions in capacitated networks with the goal of maintaining the maximum flow in the network. Here an edge is called useless, if for every capacity function there is a maximum flow not using . Biedl et al. showed that finding all useless edges is NP-complete, but solvable in time on certain planar graphs. For undirected networks, MisioĆek et al. [MC05] gave an algorithm finding all useless edges in time. Toivonen et al. [ZMT10] considered a more general model aiming to maintain the quality of paths with respect to any given function, e.g., distance or capacity. They investigated strategies of removing edges, without decreasing the quality of the best path between any pair of vertices.
Graph simplification problems have also been studied in several other contexts, and we conclude this section by mentioning two such examples: HĂŒbler et al. [HKBG08] studied a problem related to graph mining, examining how to choose an induced subgraph with a given number of vertices and with similar topological properties as the input graph. Numerous papers investigate, directly or as a tool, sparsifiers that preserve the effective resistance between certain or all pairs of vertices, see e.g. [DB13, DKW15, KS16, DKP*+*17, CGP*+*18].
1.4. Outline of this paper
In Section 2 we introduce important definitions and notations that will be used throughout this paper. In Section 3 we present our three greedy algorithms for solving Contraction with unit lengths on paths, cycles and trees (the latter result requires ). In Section 4 we discuss efficient dynamic programming algorithms for Contraction and Weak Contraction on trees. Sections 5 and 6 are devoted to our hardness results, focussing on the cases of purely additive and multiplicative error, respectively. In Section 7 we present our asymptotic results on contractions.
2. Preliminaries
Throughout this paper we consider simple undirected graphs (without parallel edges or loops). We let and denote the vertex and edge set of , respectively, and we define and . If the context is clear, we simply write , , and . We also use the notation . We assume that is connected, otherwise the contraction problem can be solved independently for each connected component. Edge lengths are given by a function . The distance between two vertices and is the length of a shortest path between and in with respect to .
Given a subset of edges , we denote the resulting simple graph obtained from by contracting the edges in , deleting resulting loops and keeping only the minimum length edge between any two vertices by . We denote the number of deleted loops and multi-edges by (thus ). Instead of contracting a set of edges in , setting their edge lengths to zero has the same effect on the distances in the resulting graph. This is somewhat cleaner conceptually, so we will often adopt this viewpoint. Specifically, we let be the new length function that assigns 0 to every edge in , and that is equal to the original edge lengths on the edges .
A tolerance function is a non-decreasing function . Roughly speaking, this function describes by how much the distance between two vertices may drop when contracting edges (i.e., setting edge lengths to zero). Formally, given a graph with edge lengths and a tolerance function , we say that a subset of edges is a -distance preserving contraction or -contraction for short, if
[TABLE]
holds for any two vertices and in . Similarly, we say that is a weak -distance preserving contraction or weak -contraction for short, if any two vertices and satisfy relation (1) or the relation , and if the graph is disconnected (equivalently, if is not a single vertex). The last condition prevents solutions for which the graph is contracted to a single vertex. If , then we simply write (weak) -contraction instead of (weak) -contraction.
An instance of the problem Contraction or Weak Contraction is a triple , where is the underlying graph, the length function and the tolerance function, and the objective is to find a (weak) -distance preserving contraction , such that
[TABLE]
is maximized. This quantity equals the number of edges we save when going from to . Note that on trees we have for any (weak) contraction , whereas on general graphs we have .
[TABLE]
In this context we sometimes refer to a set of edges that forms a (weak) contraction as a feasible solution, and to a (weak) contraction of maximum value as an optimal solution.
We begin by proving that our contraction model behaves nicely when contracting edges in phases, i.e., the total error is simply the error accumulated over the contraction phases (but not more). To state this result we denote the composition of tolerance functions and as .
Theorem 1**.**
Let be a (weak) -contraction for , and let be a (weak) -contraction for . Then is a (weak) -contraction for .
Proof.
We only prove the statement for contractions and . The proof for weak contractions works analogously. Let denote the edge lengths of and consider a pair of vertices . Then we have by the definition of and by the definition of . Combining these inequalities and using that is non-decreasing we obtain , as desired. â
Note that Theorem 1 only concerns the feasibility of repeated contractions, but not about their optimality when searching for contractions of maximum cardinality. With respect to solution quality, contracting in phases may be arbitrarily bad: Consider a star with unit length edges and additive tolerance functions . An optimum -contraction contains all edges, whereas finding an optimal -contraction and then an optimal -contraction of allows contracting only one edge in each phase, leading to a -contraction of value 2.
3. Greedy algorithms
In this section we consider three special cases of the problem Contraction with affine tolerance function . We obtain simple greedy algorithms computing maximum size -contractions in time on paths and cycles with unit lengths, and on trees with unit lengths and .
3.1. Paths with unit length edges
In this section we consider the special case of contracting a path with unit length edges and the tolerance function . In this case optimal solutions have a very special structure, which leads to a straightforward greedy algorithm running in linear time. Recall that as a path is a tree, our objective functions satisfies for any contraction .
Observe that a solution for the instance of the problem Contraction is feasible, if and only if every subpath satisfies the condition
[TABLE]
This observation leads to the following natural greedy algorithm : The algorithm considers the edges of as they are encountered when starting from one of the two end vertices of . It iteratively constructs a solution for the subpath on the first edges for , by initializing , and by adding the edge to if and only if the condition is satisfied (so after adding to , (3) is still satisfied).
Theorem 2**.**
Let be a path with unit length edges and consider the tolerance function , . The set of edges computed by the algorithm is an optimal solution for the instance of the problem Contraction, and it is computed in time .
Proof.
Let be the set of edges computed by the algorithm . Clearly, we have , and this is optimal according to (3). However, it remains to show that is feasible. For we let denote the subpath of formed by the edges . By the definition of our algorithm we know that , from which we obtain that
[TABLE]
where we used the assumption in the last step. Using (3) it thus follows that is feasible. â
3.2. Cycles with unit length edges
In this section we consider the special case of contracting a cycle with vertices and unit length edges and the tolerance function , , . For this case we present a greedy algorithm running in linear time. The main purpose of this result is to clearly separate the polynomially solvable cases of Contraction from the NP-hard cases, and the case of a cycle with unit length edges precisely forms this boundary on the polynomially solvable side. Recall in this context that we can solve Contraction in polynomial time on any tree (this will be proved in Section 4.1 below), and that Contraction is NP-hard already on a cycle for (with arbitrary edge lengths; we will show this in Section 5.1 below).
We first argue that on a cycle it is equivalent to maximize the number of contracted edges or to maximize our objective function defined in (2). This is because the set of pairs for all feasible contractions in a cycle is given by , so it forms a monotone function, implying that maximizing either one of the two quantities is equivalent. Based on this argument, for the rest of this section we consider maximizing the number of contracted edges.
Observe that a solution ( is the cycle we want to contract, and is the set of edges to be contracted) for the instance of the problem Contraction is feasible, if and only if every subpath of length satisfies the condition
[TABLE]
Rounding down on the right-hand side of (4) is justified because is always an integer.
Defining
[TABLE]
we obtain from (4) that is the maximal amount by which we can contract each edge in a uniform fractional solution. Inspired by the rounding technique from [BOR80], we turn this fractional solution into an integer optimal solution, yielding the following greedy algorithm : The algorithm considers the edges of as they are encountered when walking around the cycle. It iteratively constructs a solution by initializing and by adding the edge to if and only if for all (since , this difference is always either 0 or 1). Note that we contract all edges of if and only if .
Theorem 3**.**
Let be a cycle with unit length edges and consider the tolerance function , , . The set of edges computed by the algorithm is an optimal solution for the instance of the problem Contraction, and it is computed in time .
The next lemma shows that the contraction computed by our algorithm has the maximum size.
Lemma 3.1**.**
For any feasible solution we have with defined in (5).
Proof.
If this inequality is trivial. So let us assume that and that the minimum in (5a) is attained for some . Starting at some vertex of the cycle, we walk along the cycle and cover it with consecutive paths of length each ( starts where ends). The sum of the lengths of the paths is , so this process ends at the starting vertex , and each edge of the cycle and each edge of is covered exactly times. We therefore obtain
[TABLE]
As must be integral this inequality yields the desired bound . â
With Lemma 3.1 in hand, we are now ready to prove Theorem 3.
Proof of Theorem 3.
In this proof we will use that for any two real numbers and we have
[TABLE]
Let be the set of edges computed by the algorithm . Clearly, we have , which is optimal by Lemma 3.1. However, it remains to show that is feasible. We consider a path of length on the edges (indices are considered cyclically modulo , so ). We distinguish two cases: If , we have
[TABLE]
If , we obtain
[TABLE]
Applying (5) and using that shows that the right-hand sides of (7) and (8) can both be bounded from above by , proving that is indeed feasible by (4). â
3.3. Trees with unit length edges and additive error
In this section we consider the special case of contracting a tree with unit length edges and the tolerance function (purely additive error; we can assume w.l.o.g. that is an integer). Note that in this setting the objective function defined in (2) satisfies for any contraction . It turns out that in this case, optimal solutions have a very special structure that can be exploited to compute them in linear time. Specifically, an optimal solution is obtained by taking all edges of which have the property that only short paths start from one of its end vertices. Formally, for the tree and , we let denote the set of all edges of which have one end vertex such that all paths that start at and do not contain have length at most (together with these paths have length at most ). E.g., we have , and the set are all the edges incident to a leaf (see Figure 2).
Clearly, the set can be computed in linear time by repeatedly removing all leaves of in rounds. This is a variant of the well-known linear time algorithm to compute the so-called center of a tree (see [Ski08, Section 15.11]).
Theorem 4**.**
Let be a tree with unit length edges and consider the tolerance function , . If is even, the set of edges with is an optimal solution for the instance of the problem Contraction. If is odd, , , is an optimal solution. These solutions can be computed in time .
Proof.
We define if is even and , for some , if is odd. We first argue that is a feasible solution. To see this note that for the given tolerance function we only need to verify that the path between any two leaves of contains at most edges. Consider all the edges of for which both end vertices have distance at least from both and . None of those edges is in by its definition. It follows that and therefore .
To prove that is a solution of maximum size we argue by induction over . The claim is trivially true for and (in these cases and , respectively). So let be an arbitrary feasible solution of the instance of the problem Contraction for some . We need to show that . To this end we let denote the set of leaves of and we define . Moreover, we define and . By induction, is an optimal solution for the instance .
We first consider the case that or (this is equivalent to or ). In this case we define , and observe that is a feasible solution for the instance . It follows that , implying that , as claimed.
It remains to consider the case that both sets and are nonempty, so there is an edge and an edge . We denote the leaf incident to by . We will now remove an edge from and add instead to obtain another feasible solution satisfying . Repeating this exchange argument and applying the reasoning from the first case then proves the lemma. The edge to be removed from is obtained by considering the path that connects and in and that contains , and by choosing the first edge from (or equivalently, from ) that is encountered when following this path from to . It may happen that is the first such edge we encounter. To complete the proof of the lemma it remains to show that is feasible. To prove this we only need to check paths which start in and contain but not . Let be such a path, let be any path that also starts in but does contain , and consider the path (see Figure 3). Here and in the following we slightly abuse notation and interpret these set unions/differences/intersections in terms of the edge sets of the graphs. As is feasible and as contains , the number of edges in or on is at most . By the choice of , the number of edges of on is 1 (the only edge of on this path is ). As , we obtain that the number of edges from on is at most , as desired. This completes the proof. â
4. Dynamic programs for general trees
In this section we describe dynamic programming algorithms for the problems Contraction and Weak Contraction on trees with general edge lengths and affine tolerance functions. Recall that on trees our objective function satisfies for any contraction .
4.1. Contraction on trees
In this section we describe a dynamic programming algorithm for the problem of computing an optimal contraction of a tree with arbitrary edge lengths and an affine tolerance function , , , generalizing the solution for the special case presented at the beginning of the previous section. The goal is to prove the following result.
Theorem 5**.**
Let be a tree with edge lengths and consider the tolerance function , , . An optimal solution for the instance of the problem Contraction can be computed by dynamic programming in time .
Observe that a solution is feasible if and only if for any two vertices and of we have , where the load between and is defined as
[TABLE]
Note that , as we have . The next lemma states a criterion when feasible solutions of subtrees can be combined to a feasible solution of the entire tree. The definitions (9a), (9b) and the lemma are illustrated in Figure 4.
Lemma 4.1**.**
Consider a partition of into two subtrees and that only have a vertex in common. Then is a feasible solution for the instance of the problem Contraction if and only if the following two conditions hold: and are feasible solutions for the instances and respectively; and we have .
Proof.
Observe that the path between two vertices and contains the vertex , so we obtain from (9a). Using (9b) it follows that the condition holding for all such pairs of vertices is equivalent to . â
We will use this lemma to formulate our dynamic programming algorithm. The idea is to compute optimal solution for subtrees and combining them to an optimal solution for the entire tree.
To describe the algorithm we introduce a few definitions. An ordered rooted tree is a rooted tree with a specified left-to-right ordering for the children of each vertex. Given the tree , we can pick an arbitrary vertex as the root, and for each descendant of the root an arbitrary left-to-right ordering of its children, yielding an ordered rooted tree (different roots and orderings yield different ordered rooted trees, but any one of them is good for our purposes). We slightly abuse notation in the following and use to denote this ordered rooted tree. All trees considered in the rest of this section are ordered and rooted. For any vertex of , we let denote the subtree of rooted at , and we use to denote the number of children of . If are the children of (in the specified ordering), we write , , for the subtree of that contains , and all the descendants of . We also define . Furthermore, we define , so we have . These definitions are illustrated in Figure 4.
Using these definitions it follows straightforwardly from (9a) and (9b) that for any set of edges we have
[TABLE]
Note that the load increases if the edge is added to (see (10a)), and it decreases otherwise (see (10b)). Moreover, for any set of edges and any we obtain from those definitions that
[TABLE]
These rules allow us to compute the load of all subtrees of in a bottom-up fashion. Our dynamic program maintains the minimum load of all subtrees of in three-dimensional matrices and . We begin defining these matrices in an abstract way, and then establish several recursive relations which directly translate into a dynamic program. Specifically, for , and (recall that ) we define
[TABLE]
If there is no feasible solution of the required size, we have . The entries of are defined analogously to (12) by considering the load of instead of . In words, the entries and describe feasible solutions of size of the instances or , respectively, of the problem Contraction for which the load at the vertex is as small as possible (the matrices contain the minimum achievable load, not the corresponding set of edges).
Lemma 4.2**.**
Let be a vertex of and let be the children of . Then the matrices and defined in and directly after (12) satisfy the relations
[TABLE]
Moreover, we have
[TABLE]
for all and .
The most interesting of these recursive relations are of course (13c) and (13d). The relation (13c) captures the two possibilities of either adding the edge or not adding it to a partial solution in the tree to obtain a solution for the tree (recall (10)). The relation (13d), on the other hand, describes how to distribute contraction edges in among the two subtrees and ( is the number of edges contracted in the first tree, and the number of edges in the second tree, respectively).
Proof.
The relations (13a) and (13b) follow immediately from the definitions of the trees and and from (12). The relation (13c) follows from (10) and (12). The relation (13d) follows from (11) and (12) with the help of Lemma 4.1. â
We are now ready to prove Theorem 5.
Proof of Theorem 5.
Given the instance , we fix an arbitrary root of and an arbitrary ordering of the children of each vertex, making an ordered rooted tree. We then compute the entries of the matrices and using Lemma 4.2. We first initialize various entries using (13a) and (13b), and compute the remaining entries in a bottom-up fashion moving upwards from the leaves to the root. Specifically, at a vertex with children for which all the entries of and have already been computed, we first compute for all and using (13c), and then for all and using (13d).
Let be the largest such that . From (12) we obtain that is the size of an optimal solution of the instance . The corresponding set of edges can be obtained by keeping track of the arguments for which the minima and maxima in (13c) and (13d) are attained in each step.
Clearly, and both have entries, and computing each entry takes time , so the running time of our dynamic program is . â
4.2. Weak Contraction on trees
In this section we consider the problem of computing weak contractions for a tree with affine tolerance function . Here, our main result is a dynamic programming algorithm that builds on the algorithmic ideas presented in Section 4.1.
Theorem 6**.**
Let be a tree with edge lengths and consider the tolerance function , , . An optimal solution for the instance of the problem Weak Contraction can be computed by dynamic programming in time .
In this setting we need to specifically keep track of pairs of vertices whose distance remains positive when contracting a set of edges (i.e., not all edges in between these vertices are contracted). To this end we extend the definitions (9) as follows: For any vertex of we define the weak load of at as
[TABLE]
Note that in the maximization we have to consider all vertices such that at least one edge on the path from to is not in . This definition together with (9b) yields . In contrast to the load, the weak load may be negative. In particular, if and only if .
The following lemma is the counterpart to Lemma 4.1 for weak contractions. It describes how to combine feasible solutions on subtrees to a feasible solution of the entire tree. There is one important subtlety here: While the notion of a weak contraction forbids contracting all edges of , we clearly have to allow this for partial solutions on subtrees of (as long as some other edge not in the subtree is is not contracted, this might still yield a feasible solution).
Lemma 4.3**.**
Consider a partition of into two subtrees and that only have a vertex in common. Then is a feasible solution for the instance of the problem Weak Contraction if and only if the following two conditions hold: For , either contains every edge of or is a feasible solution for the instance of Weak Contraction; and we have
[TABLE]
Proof.
Let . For the rest of the proof we omit the subscripts and and simply write and .
We first assume that is a feasible solution for the instance of the problem Weak Contraction. I.e., any two vertices of with satisfy the condition . This is true in particular for all pairs of vertices , , implying that either or is a feasible solution for the instance . If , the claimed inequality is trivially satisfied. So suppose that is a finite number, and let and be such that , and as well as . Then we also have , so we know that by the assumption that is feasible for . Combining this last inequality with the relation proves that the right hand side of the equation is at most , as claimed. The proof of the second inequality works symmetrically. This proves one direction of the equivalence.
To prove the reverse direction, we now assume that either or is a feasible solution for the instance for , and that and . To show that is a feasible solution for the instance , let and be such that . It follows that or . We first consider the case that . By the definitions (9) and (14) we have , and also , yielding (the last inequality holds by assumption). This proves that , as desired. The proof of the other case works symmetrically. This completes the proof of the lemma. â
As in Section 4.1, we view as an ordered rooted tree, and consider its subtrees , and for all and (recall the definitions given after Lemma 4.1). Let us briefly highlight the differences between Lemmas 4.1 and 4.3. The dynamic programming algorithm presented in Section 4.1 exploits the fact that the optimal way to contract exactly edges in a subtree of rooted at a particular vertex is to contract a set of edges that minimizes . This is possible as the optimality condition in Lemma 4.1 only depends on this parameter. Here the situation is more complicated, as Lemma 4.3 also considers . Figure 5 illustrates that it is not sufficient to minimize only one of these parameters.
Consequently, we keep track of an entire Pareto front of non-dominated partial solutions (see Figure 6). Formally, we define the set of feasible partial solutions of size as the family of all sets with such that either or is a feasible solution for the instance of Weak Contraction. For two sets we say that dominates at if and , and we define the Pareto front as a minimal family of sets such that no set dominates at . Note that the domination relation is reflexive, so there may be several different such minimal families, all with the same pairs of load and weak load values, and any choice among them is equally good for us. This definition is illustrated in Figure 6.
The following crucial lemma asserts that the number of points on the Pareto front, i.e., the size of the family is at most . This property is essential for our dynamic programming approach, and it does not follow immediately from the definition of , as the set of feasible solutions is typically of exponential size.
Lemma 4.4**.**
For any , we have or . Consequently, the Pareto front has size at most .
Proof.
By the definitions (9) and (14) we have for all . Now let be such that . Again by the previously mentioned definitions this implies that for some , which is indeed an element of the set . Consequently, the Pareto front consists of at most one set with and at most one set with for each number in . Using that it follows that . â
By Lemma 4.4 the load values of all points on the Pareto front with are in the set . There might also be one point with on the Pareto front (as in the example shown in Figure 6), and this load value might not be an element of . We extend the set accordingly by defining for (recall that )
[TABLE]
If the set is empty, we have .
We now describe recursive relations for the weak load that are analogous to (10) and (11) for the load. It follows straightforwardly from (9) and (14) that for any vertex of and its children , , and for any set of edges we have
[TABLE]
Note that the weak load increases if the edge is added (see (17a)). On the other hand, if the edge is not added, it may decrease or increase (the right hand side of (17b) refers to the load, not to the weak load). Moreover, for any set of edges and any the definition (14) readily implies
[TABLE]
These rules together with the corresponding relations (10) and (11) allow us to compute the weak load and the load of all Pareto optimal partial solutions in a bottom-up fashion, similar to the approach taken in Section 4.1. Before it was sufficient to compute one optimal partial solution for every subtree and , , and every possible size of the contracted set of edges, but now our dynamic program keeps track of the entire Pareto fronts and . We store the corresponding pairs of load and weak load values on the Pareto front in separate four-dimensional matrices , , and (the entries of and are certain weak load values, and the entries of and are the corresponding load values). We begin defining these matrices in an abstract way, and then establish several recursive relations which directly translate into a dynamic programming algorithm. Specifically, for , , and with as in Lemma 4.4 we define
[TABLE]
If there is no set satisfying these requirements, we have . The entries of and are defined analogously to (19) by considering the tree instead of (in particular, in this case we have ).
The definitions of and given in (19) extend straightforwardly to the value defined in (16a). Similarly, the definitions of and from before extend to the value . It is easy to see that we have in fact
[TABLE]
(an analogous relation holds for the entries of ).
The recursive relations satisfied by the matrices , , and defined before are captured by the following two lemmas. The initialization steps and the recursive computation of and are treated in Lemma 4.5. The recursive computation of and is somewhat more technical, and is treated separately in Lemma 4.6.
Lemma 4.5**.**
Let be a vertex of and let be the children of . Then the matrices , , and defined in and directly after (19) satisfy the relations
[TABLE]
Finally, we have
[TABLE]
where is minimal such that , if such a value exists, and otherwise, for all , and .
Note that the relations (21a)â(21f) are the initialization steps, and the relations (21g)â(21j) capture the two possibilities of either adding or not adding the edge to a partial solution in the tree to obtain a solution for the tree (recall (10) and (17)).
We only refer to well-defined entries of and in (21h) and in the definition of , as holds for every . Note that we either have or , while may also take a value in the open interval .
Proof.
The relations (21a)â(21f) follow immediately from the definitions of the trees and and the definitions of the respective matrices given in (19) and afterwards. The relations (21g) and (21i) follow from (17) and the definitions of and , respectively: Consider a partial solution . If , then does not contain the edge , so we have . The other cases of (21g) as well as (21i) are implied by the following observation: If and , then by (17) we have and .
The relation (21h) is closely related to (21g). If , then (21h) follows immediately from (21g) and the definitions of and . If , then both a partial solution containing the edge as well as one missing this edge minimize the weak load. As the weak load is bounded from above by the load, we get in this case. This implies (21h). An analogous argument yields (21j). â
The following lemma describes the recursive relations satisfied by the entries of and . Specifically, the lemma describes how to distribute contraction edges in among the two subtrees and ( is the number of edges contracted in the first tree, and the number of edges in the second tree, respectively). To compute a single point on the Pareto front , we need to consider all points on the Pareto fronts and .
Lemma 4.6**.**
*Let be a vertex of , and let and be fixed throughout this lemma. For we let denote the set of all pairs with and such that and . For and we let denote the set of all pairs satisfying .
For all , defining*
[TABLE]
Proof.
The relation (22c) follows by combining the definitions (19a) and (22a) with the relations (11), (18) and the condition (15) from Lemma 4.3. The argument for (22d) is analogous, using the definitions (19b) and (22b) instead of (19a) and (22a).
The relation (22g) follows by combining the definitions (16a) and (22e) (recall also (20)) with the relations (11), (18) and the condition (15) from Lemma 4.3. The argument for (22h) is analogous, using the definitions (19a) and (22f) instead of (16a) and (22e). â
We can trivially compute the quantities , , and as defined in Lemma 4.6 in time (using that and by Lemma 4.4). The following lemma shows how to do the same computation in time , so that the entries and can be computed via (22c), (22d), (22g) and (22h) in time (instead of the trivial bound ).
Lemma 4.7**.**
If the numbers in the sets and are sorted increasingly, the quantities , , and defined in Lemma 4.6 can be computed in time . Consequently, and can be computed for all and all in time .
Proof.
We define the sequence of all pairs of finite numbers for all in increasing order of -values. Similarly, we define the sequence of all pairs of finite numbers for all in increasing order of -values. By Lemma 4.4 each of these lists has size . Note that these sequences correspond to the Pareto fronts and , respectively. Some pairs of points may appear multiple times consecutively in and , and in a preprocessing step we eliminate these duplicates in time . We know that after this preprocessing step, the first entries in the simplified lists and are strictly increasing, and the second entries are strictly decreasing (recall Figure 6).
We first argue how to compute and . We begin discarding all pairs from each list whose first entry ( or , respectively) is strictly greater than in time . We then process the remaining lists and beginning at the last entries and (with smallest or -values, respectively) in two phases.
In the first phase we compute as follows: If , we discard the last element of by decreasing by 1 (by our sorting of the lists we know that for all ). If , we discard the last element of by decreasing by 1 (by our sorting of the lists we know that for all ). Once and for the first time, we have found . If this never happens we know that . This computation is correct by the definition of in Lemma 4.6 and by (22a), and it takes time .
In the second phase we compute as follows: If , we know that , too. Otherwise we distinguish two cases: If , we decrease further as long as both inequalities and are still satisfied (so that they still hold for the final ). If , we decrease further as long as both inequalities and are still satisfied (so that they still hold for the final ). In the end we set . Note that in the first case, the third constraint remains valid by the monotonicity for all , and in the second case, the third constraint remains valid by the monotonicity for all . Therefore, the correctness of the computation of follows from (22b).
The procedure to compute and processes and (as obtained from the preprocessing step explained in the beginning) starting at the first entries , , and , , in two phases very similarly to before. We omit the details here. â
We are now ready to prove Theorem 6.
Proof of Theorem 6.
Given the instance , we fix an arbitrary root of and an arbitrary ordering of the children of each vertex, making an ordered rooted tree.
We begin precomputing and sorting all of the sets and , , , and we maintain them as sorted lists throughout the algorithm. This takes time in total (recall Lemma 4.4).
We then compute the entries of the matrices , , and using Lemmas 4.5 and 4.7. We first initialize various entries using (21a)â(21f), and compute the remaining entries in a bottom-up fashion moving upwards from the leaves to the root. Specifically, at a vertex with children for which all the entries of , , and have already been computed, we first compute and then for all , and using (21g) and (21h), then we compute and for all and using (21i) and (21j). We obtain sorted lists containing the numbers in by inserting at the correct position into the precomputed list . Next, we compute and then for all , and using (22c) and (22d), and then we compute and for all and using (22g) and (22h). We obtain sorted lists containing the numbers in by inserting at the correct position into the precomputed list .
Let be the largest such that is finite. From (19) we obtain that is the size of an optimal solution of the instance . The corresponding set of edges can be obtained by keeping track of the arguments for which the minima and maxima in (21g)â(21j) and (22a)â(22h) are attained in each step.
Each of the matrices , , and has entries (recall Lemma 4.4). Computing an entry of or takes time by Lemma 4.5, while computing an entry of or can be achieved in time by Lemma 4.7, so the runnning time of our dynamic program is . â
5. Hardness for additive tolerance functions
In this section we prove that the problems Contraction and Weak Contraction for the tolerance function (purely additive error) are hard already on cycles (Section 5.1 below). We then prove that Contraction with the same tolerance function is hard to approximate for general graphs and for bipartite graphs (Section 5.2).
5.1. Hardness of Contraction and Weak Contraction
Recall that we can compute optimal (weak) -contractions in polynomial time on trees (this was shown in Section 4.1), and have a linear time algorithm for Contraction on cycles with unit length edges (this was shown in Section 3.2). We now show that the problem with is NP-hard on cycles with arbitrary edge lengths.
Theorem 7**.**
For any fixed , the problems Contraction and Weak Contraction with tolerance function , , are NP-hard on cycles.
Theorem 7 (where is not part of the input) follows immediately from Theorem 8 below (where is part of the input). The reason is that an instance with does not change when multiplying all edge lengths and by some constant.
Theorem 8**.**
The problems Contraction and Weak Contraction with tolerance function , , are NP-hard on cycles.
The rest of this section is devoted to proving Theorem 8.
For our proof we will use the following variant of the well-known problem Partition, referred to as Close-to-1 Partition. To state the problem we say that a set of positive rational numbers is close to 1, if and .
[TABLE]
Note that for a âYesâ-instance of this problem, the solution must have size , so . In particular, this implies that is even.
In the classical problem Partition, the input set is not constrained to be close to 1. Partition was shown to be NP-complete already in Karpâs seminal paper [Kar72]. The fact that Close-to-1 Partition is also NP-complete follows from a straightforward rescaling argument.
Lemma 5.1**.**
Close-to-1 Partition* is NP-complete.*
Proof.
Given an instance of Partition, we first add additional zeroes to the instance (by this we ensure that a partition with equal sums is transformed into one where both partition classes have the same number of summands). We then linearly transform all the according to , where and are sufficiently large constants so that the transformed values are close to 1. The transformed set of numbers has even cardinality , is close to 1, and it admits a partition into two sets of size with equal sum if and only if the original instance allows a partition into two sets with equal sum. â
Proof of Theorem 8.
We first focus on the problem Contraction. We reduce Close-to-1 Partition, which is NP-complete by Lemma 5.1, to the problem Contraction on a cycle with tolerance function , .
Let be an instance of Close-to-1 Partition such that . This ensures that all that are bigger than 1 appear before all that are smaller than 1, which is the only property of the ordering that we exploit in the proof later on. The instance of Contraction we construct is on the cycle with edges. We label the vertices of the cycle by walking around the cycle as follows: The first vertices are labelled , then there are two special vertices , , and the remaining vertices are labelled , see Figure 7. We denote the subpath as , and the subpath by .
We now define , and , and the length function on the cycle edges by setting and for all , and by , , and (see Figure 7).
Now consider the instance with of the problem Contraction. Observe that no -contraction may contain an edge of length greater than (in particular, no feasible solution may contain one of the edges of length or ). Furthermore any (weak) -contraction on this graph satisfies .
We will show that has an optimal solution of cardinality (and thus of value) if and only if is a âYesâ-instance. In particular, we will see that any feasible solution of of size contains the two edges of length and exactly edges with length , , from and the corresponding edges with length , , from . Such solutions correspond to subsets of in the following natural way: For any subset of size we let be the subset of edges of the cycle consisting of the two edges of length and of all edges and (of length or , respectively) for all . Thus we will show that is an optimal solution of the instance of Contraction if and only if , i.e., is a âYesâ-instance of Close-to-1 Partition.
Both directions of this equivalence are captured and proved as Claim 2 and 4 below. Claims 1 and 3 are auxiliary statements used in the proofs of these two main claims.
For any path on the cycle we let denote the sum of over all edges of . For all we denote by and the path on the cycle between the vertices and that contains and that does not contain the edge , respectively (in Figure 7, these are the right and left segment of the cycle).
Claim 1: For all , the number lies in the interval and the number lies in the interval . In particular, we have and the difference lies in the interval .
Proof of Claim 1: Note that the condition implies that
[TABLE]
By our assumption , the numbers form a unimodal sequence for that is maximized for and , proving that (note that ). By (23) the minimum of this unimodal sequence is at most smaller than the maximum. This proves the first part of the claim. As , we obtain the second part of the claim. The last part of the claim is an immediate consequence of the first two.
Claim 2: If is a solution of the instance of Close-to-1 Partition such that , then is a -contraction.
Proof of Claim 2: It suffices to prove that there is no pair of vertices whose distance decreases by more than when contracting the edges in .
We start by verifying this for the pairs for . We first consider the path between and . Observe that lies in the interval . Similarly to before, this follows from the observation that by the assumption those sums form a unimodal sequence for that is maximized for and , and by using (23) (recall also that ). Consequently, we have
[TABLE]
Since , we obtain that lies in the interval , yielding
[TABLE]
Combining (24) and (25) proves that
[TABLE]
Now consider two vertices and , (the case can be treated analogously). Let and be the path on the cycle between the vertices and that contains and that does not contain the edge , respectively. Using that we obtain
[TABLE]
from (24).
We know that and consequently
[TABLE]
by the assumption that the input of the instance is close to 1 (there is plenty of leeway in all those inequalities). Furthermore, we have
[TABLE]
where the second-to-last inequality follows from Claim 1.
Combining those observations yields
[TABLE]
Combining (27) and (30) proves that
[TABLE]
From (27) and (30) we can derive analogous relations for the remaining cases where we need to consider the distance between a vertex , , and a vertex , between a vertex , , and a vertex , and between the vertices and . This completes the proof of Claim 2.
Claim 3: Every -contraction contains at most edges in for all and at most edges in for all .
Proof of Claim 3: Note that for any and we have by the definition of . Consequently, assuming for the sake of contradiction that contains strictly more than edges in , we have . Similarly, assuming that contains strictly more than edges in yields . By Claim 1 the difference lies in the interval , so in both cases we obtain
[TABLE]
where we used that in the second-to-last step. This contradicts the fact that is a -contraction, proving Claim 3.
Claim 4: Let be a feasible solution of the instance of Contraction. Then we have , and if , we have for some set with .
Proof of Claim 4: As does not contain any of the edges of length or , we have by Claim 3 (the +2 comes from the two edges of length that may be contained in ). Suppose now that . Applying Claim 3 again shows that must contain both edges of length , and that it contains the edge if and only if it contains the edge , for all . Defining we have and .
By Claim 1 we have and . As is a -contraction containing the two edges of length we thus obtain . Similarly, we have . As , these two inequalities must be tight, yielding .
Combining Claims 2 and 4 proves the statement of the theorem for the problem Contraction.
We now focus on the problem Weak Contraction. The hardness result follows immediately from the following claim.
Claim 5: For , any feasible weak -contraction on the instance is also a feasible -contraction.
Proof of Claim 5: Suppose for the sake of contradiction that is not a feasible -contraction. This means there are vertices such that and , i.e., and lie on a (maximal) subpath formed by edges from on the cycle. Let be one end vertex of , and let be the neighbour of not on . Let be the last vertex on when traversed starting at , such that the length of the --path containing is at most , and let be the next vertex on when traversed starting at . Such a vertex exists as , and the --path containing has length strictly greater than .
We have , as does not contract the entire cycle. By (1), we have . As , we get . As we saw before, the --path has length strictly greater than , thus the --path not containing must have length at most . As the entire cycle has length and can be partitioned into and the edge , we get
[TABLE]
where the second inequality holds as the two longest edges of the cycle have length and , respectively. From this chain of inequalities we obtain , contradicting the assumption . â
The reader might be tempted to âsimplifyâ the previous reduction proof by omitting the four special edges of length , and and by setting instead. However, this would invalidate Claim 2 (specifically, the estimate (25) would not always hold).
5.2. Inapproximability of Contraction
We are able to extend the before-mentioned hardness result for Contraction as follows:
Theorem 9**.**
For any fixed and , it is NP-hard to approximate the problem Contraction with tolerance function , , to within a factor of .
For the following theorem the additive error is fixed to .
Theorem 10**.**
For any , it is NP-hard to approximate the problem Contraction with tolerance function on bipartite graphs with unit length edges to within a factor of .
Our reductions are based on the inapproximability of the well-known Clique problem. Recall that a clique in a graph is a complete subgraph of .
[TABLE]
It was shown in [Zuc07] that for any , it is NP-hard to approximate Clique to within a factor of .
The following lemma will be used in our proofs. It shows that for -contractions the feasibility condition (1) needs not be checked for all pairs of vertices and , but only for those satisfying certain extra conditions.
Lemma 5.2**.**
A set of edges is a -contraction if and only if all pairs of vertices with the property that every shortest path with respect to between and starts and ends with an edge from satisfy condition (1).
Proof.
Suppose for the sake of contradiction that all pairs of vertices as in the lemma satisfy condition (1) and that is not a -contraction. Then there is a pair of vertices violating (1) and a shortest path with respect to between and that does not start or end with an edge from . We choose and such that is minimal, and we may assume that the first edge of is not contained in , so . By our choice of and , the vertices and satisfy (1), i.e., the right-hand side of this equation is bounded by , a contradiction. â
Proof of Theorem 9.
Let be fixed and let be an instance of Clique.
We define a graph as follows, see Figure 8: The vertex set of is given by , i.e., we create two copies of each original vertex and add a special vertex . The edge set of is given by plus the edges and for all . The first set of edges are simply the original edges of on the first copies of the vertices, the second set is a perfect matching between the two copies of the vertex set, and the third set of edges connects the special vertex to all vertices of the second copy of the vertex set. The length function on the edges of is set to , or for those three sets of edges, respectively.
Now consider the instance of the problem Contraction with the tolerance function . Clearly, any -contraction in can contain only edges of the form for some . As does not contain two edges between two different connected components of , our objective function defined in (2) satisfies for any feasible solution of . We will show that it allows a feasible solution with edges (and thus of value ) if and only if has a clique with vertices. Formally, for we define (see Figure 8). We proceed to show that induces a clique in if and only if is a -contraction in .
Note that for any two vertices we have
[TABLE]
These relations together with Lemma 5.2 show that is a -contraction in if and only if is a clique in .
As differs from only by a constant factor, an -approximation algorithm for Contraction would yield an -approximation algorithm for Clique via this reduction. Together with the before-mentioned inapproximability of Clique [Zuc07] this proves the theorem. â
The rest of this section is devoted to proving Theorem 10, so we now focus on -contractions in bipartite graphs with unit length edges . The next lemma characterizes the structure of contractions in this setting.
Lemma 5.3**.**
Let be a bipartite graph with unit edge lengths and let be a set of edges.
- (i)
If is a -contraction, then is a matching. 2. (ii)
If with edges , then is a -contraction if and only if and . 3. (iii)
* is a -contraction if and only if all two-element subsets of are.*
Proof.
- (i)
Suppose for the sake of contradiction that contains a path on two edges. As is bipartite, it has no triangles, so and , a contradiction to the assumption that is a -contraction. 2. (ii)
For the edges and we define for .
Let be a -contraction. Both and must have the same parity (as is bipartite), so if , the difference between them is exactly 2. However, this would mean that , a contradiction to the assumption that is a -contraction. Repeating the same argument with and interchanged shows that . An analogous argument shows that .
Now suppose that and . From these conditions it follows that for all every path between and that contains both edges and has length at least with respect to . Consequently, we have for . By Lemma 5.2, is a -contraction. 3. (iii)
One direction of the equivalence is obvious, so we only need to prove the other direction. So we assume that all two-element subsets of are -contractions, and we need to prove that is a -contraction. The argument is a straightforward generalization of the argument for (ii) from before. Let be a path that contains exactly edges from , and that starts and ends with an edge from . Let be those edges and their end vertices as they are encountered when traversing (so and are the end vertices of ). For all the pair of edges and and their end vertices satisfy the distance conditions from (ii). From these conditions it follows that the subpath of between and has length at least . So overall the length of is at least . Consequently, we have . By Lemma 5.2, is a -contraction.
â
With Lemma 5.3 in hand, we are now ready to prove Theorem 10.
Proof of Theorem 10.
Let be fixed and let be an instance of Clique. We construct a bipartite graph as follows, see Figure 9: For every vertex , the graph contains two vertices and and the edge . For every edge , we add a vertex and the edges and to . Furthermore, we add a new special vertex to and all the edges , , and , . It is easy to check that the graph defined in this way is bipartite.
All edges of receive unit lengths () and we consider the instance of the problem Contraction with the tolerance function .
For any set of vertices we define (see Figure 9).
Claim 1: If is a clique in , then is a -contraction in and .
Proof of Claim 1: Let be a a set of vertices in that form a clique, and let be two vertices from this clique. Then we have and , so Lemma 5.3 (ii) implies that is a -contraction in . Repeating this argument for every pair of vertices from and applying Lemma 5.3 (iii) yields that is a -contraction in . As there are never two edges in between any two connected components of the graph , we have .
For any set of edges , we let be the set of vertices for which is incident to an edge in .
Claim 2: If is a -contraction, then is a matching in and is a clique in of size at least .
Proof of Claim 2: is a matching by Lemma 5.3 (i).
Let . We will show that by applying Lemma 5.3 (ii) to the two edges in incident to and . To prove that it suffices to show that .
Let us first consider the case that . As (the shortest path between those vertices goes via ), Lemma 5.3 (ii) implies that . We now consider the case that there is an edge with . We then have (via ), so Lemma 5.3 (ii) yields . Finally, we consider the case that there are two edges with . We then have (via ), again implying that . This proves that indeed , so forms a clique in .
Every edge in is either incident to or to a vertex of the form , . Since at most one of the edges incident to can be in , the definition of shows that the size of is either or . Therefore, to finish the proof of Claim 2, it suffices to show that . If contains no two edges that are connected by more than one edge in , then we have . Otherwise we consider two such edges and from . It is easy to check that either or must be incident to , so suppose that the edge contains . We first consider the case that for some edge . In this case it follows that or , so we have . Now consider the case that for some vertex . In this case it follows that for exactly one edge incident to in , showing that . In all three cases we have , as claimed.
Combining Claims 1 and 2 will allow us to prove the following claim:
Claim 3: If there is an -approximation algorithm for Contraction, then there is an -approximation algorithm for Clique.
Proof of Claim 3: Suppose for the sake of contradiction that such an approximation algorithm for Contraction exists. We use it to compute a clique in a given instance of Clique as follows: We construct and compute a solution of Contraction for this instance, and we define the clique as before (recall Claim 2). If , we return , otherwise we return any vertex from . We denote the clique computed in this fashion by .
We may assume that , in particular . It follows that
[TABLE]
By assumption we know that
[TABLE]
where is an optimal solution of . In particular, is positive.
Combining these observations we get
[TABLE]
where the second inequality holds because of Claim 2, and the last inequality involving the clique number holds because of Claim 1.
As , Claim 3 implies the theorem (using the inapproximability of Clique proved in [Zuc07]). â
6. Hardness for multiplicative tolerance function
By Theorem 7, the problem Weak Contraction with purely additive tolerance function is NP-hard on cycles. In this section we prove the hardness and inapproximability of this problem also in the case of a purely multiplicative tolerance function , . Recall that the problem Contraction is trivial for this tolerance function (we may not contract any edges).
6.1. Hardness of planar Weak Contraction
To state the main result of this section recall that the girth of a graph is defined as the minimum length of a cycle in .
Theorem 11**.**
For any , the problem Weak Contraction with tolerance function , is NP-hard for planar graphs with girth at least and unit length edges .
Theorem 11 implies that Weak Contraction is hard for a general multiplicative tolerance function , , but it leaves open the question whether this is true also for other fixed values of other than 2 (when is not part of the input). The arguments given in this section for carry over straightforwardly to any fixed value , but not to 3 or larger values (for and unit length edges the problem is trivial).
We first characterize the set of feasible solutions in this special case.
Lemma 6.1**.**
Let be a graph with girth at least 6 and unit length edges , and consider the tolerance function . Furthermore, let be a set of edges such that is disconnected. Then is a weak -contraction if and only if for any two edges either and are incident and both contain a degree-1 vertex, or any path containing and also contains at least two edges not in .
Recall that the assumption that is disconnected prevents solutions for which the contracted graph is a single vertex. Note that Lemma 6.1 does not require to be planar.
Proof.
To prove the equivalence, we need the following auxiliary claim:
Claim: If is a weak -contraction, then every component of that is not a single edge is a star with the property that each of its vertices except the center of the star has degree 1 in .
Proof of Claim: Let be a component of with more than one edge. Clearly, there must be an edge with vertices and . If contains a path on two edges starting at and ending at some vertex , then and , a contradiction to the assumption that is a weak -contraction (note that is the shortest path between and , as the girth of is at least 6). Thus the edges of must form a star centered at . By the same argument, no vertex outside can be connected to any vertex of other than . This proves the claim.
We first assume that is a weak -contraction, and we need to show that any two edges satisfy the conditions of the lemma. If and are incident, the statement follows from the auxiliary claim from before. If and are not incident, we consider an inclusion-minimal path containing both and . We let and be the end vertices of , the other end vertex of , and the other end vertex of ( and are the vertices at distance 1 from the ends of the path). If the distance between and was only 1, we have and (here we need again the assumption that the girth is at least 6), a contradiction to the assumption that is a weak -contraction. Therefore at least two edges lie between and . The auxiliary claim from before implies that no two incident edges on between and are contained in , therefore must contain at least two edges not in . This proves one direction of the equivalence.
To prove the other direction, we now assume that any two edges satisfy the conditions of the lemma, and we need to show that is a weak -contraction. Consider any two vertices and with , and any path between and . As no inner vertex of is a leaf, we know that between any two consecutive edges from on there are at least 2 edges not in . This proves that , as desired.
This completes the proof of the lemma. â
For a given propositional formula in conjunctive normal form (CNF) the bipartite variable-clause graph is defined as follows: The two partition classes of are given by the sets of variables and clauses of , and there is an edge between a variable and a clause if appears in . If contains as a positive or negative literal, we call the corresponding edge of a positive or negative edge, respectively. A planar drawing of , where positive and negative edges appear in cyclically contiguous intervals around every variable vertex, is called contiguous.
We call a -CNF formula regular, if every clause contains exactly literals, no clause contains a literal twice, every variable appears at least once as a positive literal and at least once as a negative literal in the formula.
Consider now the following variant of 3SAT.
[TABLE]
Lemma 6.2**.**
Contiguous Planar 3SAT* is NP-complete.*
Proof.
The more general variant of Contiguous Planar 3SAT not requiring to be regular was shown to be NP-complete in [dBK12]. We now show how to reduce this generalization to Contiguous Planar 3SAT, which will prove the lemma. Given a (not necessarily regular) 3-CNF formula we first eliminate all variables appearing only as negative or only as positive literals and all clauses containing exactly one literal, as well as multiple appearances of literals in the same clause. This yields a formula in which all clauses have two or three literals, no clause contains a literal twice, and every variable appears at least once as a positive literal and at least once as a negative literal in . Moreover, since is a subgraph of , we also obtain a contiguous planar drawing of . As a last step we eliminate clauses with two literals by introducing a new variable for each of them and replacing by the equivalent formula . It is easy to check that the resulting formula is regular and equisatisfiable to , and to obtain a contiguous planar drawing of , see Figure 10.
â
Proof of Theorem 11.
We first present the proof for the case , and then sketch how to generalize it for larger values of .
We reduce Contiguous Planar 3SAT to Weak Contraction. Consider an instance of Contiguous Planar 3SAT with variables and clauses .
Given the formula , we construct from it a graph as follows, see Figures 11 and 12. For every variable , , we add a variable gadget as shown on the left hand side of Figure 11 to the graph . The vertices and will be used later to connect this gadget to other parts of the graph. The idea of the variable gadget is that an optimal solution of our instance of Weak Contraction should contain either the four edges or the four edges , corresponding to setting to true or false, respectively.
For every clause , , we add a clause gadget (a star with three edges) as shown on the right hand side of Figure 11 to the graph . The vertices , and will be used later to connect this gadget to other parts of the graph. The idea of the clause gadget is that a feasible solution contains at most one of these three edges, and if it does contain one of them, this restricts the choice we have inside the respective neighbouring variable gadget.
We connect the variable and clause gadgets in as follows (see Figure 12): For every and , if the -th literal in the clause is , we add an edge connecting to , and if the -th literal in the clause is , we add an edge connecting to . We refer to the edges added to in this step as connection edges.
This completes the definition of the graph . It is easy to see that this graph is planar. Specifically, a planar embedding can be obtained from the given planar embedding of by replacing variable vertices in by the variable gadgets in , and by replacing clause vertices by the clause gadgets . Using that for each variable vertex in the positive and negative edges appear in cyclically contiguous intervals around , the connection edges in (that connect the variable and clause gadgets) can also be drawn in a planar fashion.
Moreover, it is easy to check that has girth 6 and no degree-1 vertices.
Now consider the instance of the problem Weak Contraction with (unit length edges) and the tolerance function .
Lemma 6.1 implies that any feasible solution of is a matching, as has no vertices of degree 1. As contains no cycles of length 3 or 4, it cannot contain two edges between vertex sets of two different components of for any such feasible solution . This implies that our objective function satisfies .
We proceed to show that is satisfiable if and only if has an optimal solution of cardinality (and thus of value) . Specifically, a satisfying assignment of corresponds to a solution that contains exactly all edges of either or in for every variable (corresponding to the value true or false assigned to this variable, respectively) and exactly one edge in for each clause (corresponding to a literal that satisfies this clause).
Formally, for any variable assignment , we define the set of edges as follows: contains all edges of for any variable , , that sets to true, and it contains all edges of for any variable that sets to false. Moreover, for every clause , , that is satisfied by , we choose an index of a literal in that is satisfied by and add the edge to .
The following claim is an immediate consequence of Lemma 6.1.
Claim 1: Any subset is a feasible solution if and only if every path containing two edges from also contains at least two edges not in .
By Claim 1, for every variable assignment of , the set is a feasible solution of . In particular, if satisfies , then is a feasible solution of size . The remainder of the proof is devoted to showing the converse, i.e., if is a feasible solution of size , then is satisfiable.
For all we let denote the subgraph of induced by all edges of and all connection edges incident to either or .
Claim 2: For any , contains at most one edge from . For any , contains at most four edges from . Moreover, if contains one of the connection edges incident to or for some , it does not contain any edges from the gadget that is connected to via this edge.
Proof of Claim 2: The first and last statement are immediate consequences of Claim 1. The argument for the second statement is as follows: For all we let denote the set of edges plus the connection edges incident to , and we let denote the set of edges plus the connection edges incident to . By Claim 1, contains at most two edges from , and if the intersection size is two, then must contain the edge . Similarly, contains at most two edges from , and if the intersection size is two, then must contain the edge . As cannot contain and simultaneously, contains at most three edges from , and if the intersection size is three, then must contain either or . Again by Claim 1, contains at most two edges from the 6-cycle . However, if contains one of the edges or , it contains at most one edge from this 6-cycle. This proves that indeed contains at most four edges from .
Note that every edge of belongs to exactly one subgraph or . So if , we know by Claim 2 that contains exactly four edges from for all and exactly one edge from for all , and none of the connection edges in .
Claim 3: For any , if contains four edges from and if is not among them, then those edges must be . On the other hand, if is not among them, those edges must be . In particular, these two cases cannot occur simultaneously.
Proof of Claim 3: If contains four edges from and is not among them, Claim 1 enforces taking first the edge , then and , and eventually . This proves the first part of the statement. The argument for the second part is symmetric. The third part of the statement is a consequence of the first two.
So given a solution of of size , we can derive from it a satisfying assignment of as follows: For every clause , , we consider the unique edge from that belongs to . We follow the attachment edge incident to , leading to the corresponding variable gadget , and connecting to either or . If the attachment edge connects to , then by Claim 1, , so by Claim 3, the four edges of contained in must be , so we define . If the attachment edge connects to , then by Claim 1, , so by Claim 3, the four edges of contained in must be , so we define . This process does not lead to any contradicting variable assignments by the last statement of Claim 3. However, this process may leave some variables undefined, and we can set them arbitrarily, e.g., . By construction, each clause receives a satisfying literal, so the assignment is indeed a satisfying assignment of .
This proves that is satisfiable if and only if has a feasible solution of size (which must be optimal by Claim 2), completing the proof of the theorem in the case .
For values , the construction of the gadgets and can be generalized as follows: We subdivide each of the edges and , and each of the edges , and into edges. Then the resulting graph clearly has girth , and the above arguments can be easily modified to show that any solution of contains at most edges from for all , and at most edges from for all , and that is satisfiable if and only if has an optimal solution of size . This completes the proof. â
6.2. Inapproximability of Weak Contraction
We are able to further extend our hardness results for Weak Contraction as follows:
Theorem 12**.**
For any , it is NP-hard to approximate the problem Weak Contraction with tolerance function to within a factor of .
Theorem 12 implies that Weak Contraction is hard to approximate for general multiplicative tolerance functions , , but it leaves open the question whether this is true also for other fixed values of other than (when is not part of the input). The arguments given in this section for carry over straightforwardly to any fixed value , but not to 2 or larger values (for the problem is trivial).
This time we reduce from the well-known Independent Set problem (which is equivalent to Clique by considering the complement graph). Recall that an independent set in a graph is a subset of vertices of such that no two vertices in the subset are adjacent.
[TABLE]
We use again the fact that for any , Independent Set is NP-hard to approximate to within a factor of [Zuc07].
Proof of Theorem 12.
Let be an instance of Independent Set. We construct a graph and a length function on the edges of as follows, see Figure 13: We start with a copy of , and all edges of this copy receive length 2. The vertices of this copy are also denoted by . We then add additional vertices and edges to as follows: To every vertex we attach two pending edges and of length 1 or 2, respectively. We may assume , and thus also , to be connected.
Now consider the instance of the problem Weak Contraction with the tolerance function .
We proceed to show that has a feasible solution of value if and only if has an independent set of size . This is an immediate consequence of Claim 3 below. To prove Claim 3 we need the following two auxiliary claims.
Claim 1: For any induced subgraph of that is a path on two edges, a feasible solution of does not contain only the longer of the two edges (either it contains none of the two, the shorter of the two if there is one, or both).
Proof of Claim 1: Consider a path on two edges , of length 2 in such that , and suppose for the sake of contradiction that , but . Then we have and , violating the condition (1) for the given tolerance function. A similar contradiction arises if one of the edges has length 1 and the other length 2, and only the edge of length 2 is contracted. This proves the claim.
Claim 2: No feasible solution of contains an edge of length 2.
Proof of Claim 2: Assume for the sake of contradiction that a feasible solution contains an edge of length 2. Note that any edge of may be reached from via a walk where and , and for all we have and the edges and induce a path in . Now successively applying Claim 1 to the subgraphs induced by and for shows that contracts . Thus violates the condition that a weak contraction must not contract every edge.
Claim 2 implies that our objective functions satisfies for every feasible solution of , because never contains two edges between two different connected components of .
For any set of vertices we define .
Claim 3: A set of edges is a feasible solution of if and only if for an independent set in .
Proof of Claim 3: Let be a feasible solution of . By Claim 2, contains only edges of length 1, so we have for some set of vertices in . Suppose that two such vertices are connected by an edge, then we would have and , violating the condition (1) for the given tolerance function. It follows that is an independent set.
To prove the other direction of the equivalence, let be an independent set in and consider the set of edges in . To verify that is a weak -contraction, it suffices to check condition (1) between the end vertices of paths on two edges, one of length 1 from and the other of length 2, and for paths on edges that start and end with an edge of length 1 from . In the first case the contraction changes the distance from 3 to 2, which is compatible with (1). In the second case the contraction changes the distance from to , which is also compatible with (1), where we use that because of the assumption that is an independent set.
Claim 3 implies that has a feasible solution with edges if and only if has an independent set of size . As , the theorem follows from the [Zuc07] result. â
7. Asymptotic bounds
In this section we show how to compute contractions for graphs that are not optimal, but can be computed efficiently despite our hardness results from the previous section. In this vein, the main results of this section are Theorem 13 and the corresponding (not tight) lower bound (Theorem 15) for the case of tolerance functions of the form . Further we consider purely additive tolerance functions (Section 7.2) and the factor by which a contraction can reduce the number of vertices (Section 7.3). Throughout this section, we assume all graphs to have unit length edges .
7.1. Almost multiplicative contractions
As mentioned in the introduction, a purely multiplicative tolerance function () forbids decreasing any distances. In this section we thus consider an âalmostâ purely multiplicative tolerance function of the form .
Theorem 13**.**
Let be a real number. Any graph has a -contraction such that the contracted graph has at most edges, and such a contraction can be computed in time .
Recall that here and throughout, and denote the number of vertices and edges of the input graph , not of the contracted graph . Setting in Theorem 13 yields the following corollary.
Corollary 14**.**
Any graph has a -contraction such that the contracted graph has at most edges, and such a contraction can be computed in time .
To prove Theorem 13, we use a clustering approach as presented in [Awe85], yielding the next lemma. Specifically, the following crucial lemma appears in a slightly weaker form in that paper. For any real number , we define an -partition of a graph as a set of clusters , , with corresponding cluster centers , where the sets are required to form a partition of the vertex set and where for all and . We denote the resulting -partition by . We write for the number of pairs for which and are connected by at least one edge, and we refer to this quantity as the density of .
Lemma 7.1**.**
Let be a real number. Any graph with unit length edges has an -partition with density , and such a partition can be computed in time .
Proof.
The idea of the algorithm is to build an -partition of iteratively in rounds. In each round, we build a new cluster and remove all vertices from that cluster from the graph, processing the subgraph on the remaining vertices in the next round. The algorithm proceeds until all vertices are assigned to a cluster. In round , we choose an arbitrary vertex as a cluster center, and define layers around the vertex , where the layer consists of all vertices at distance exactly from (this distance is measured in the subgraph of under consideration in this round). We continue computing these layers as long as the number of vertices in the new layer is at least the number of vertices in all previous layers times the factor . The cluster is defined as the union of all layers around satisfying this expansion condition. We refer to the first layer violating this condition (which is not added to anymore) as the rejected layer. We let denote the partition of the vertices of computed in this fashion.
To verify that is indeed an -partition, we proceed to show that each vertex within a cluster has distance at most from the center vertex of that cluster, and that the density of the partition is at most . Intuitively, the expansion condition in the definition of the layers ensures that a cluster has few layers and that the number of edges that go to unclustered vertices is small.
Consider a cluster with center vertex and the layers . Suppose for the sake of contradiction that . By the definition of the layers in the algorithm we know that holds for all , implying that . Consequently, the size of the cluster satisfies , a contradiction.
We now show that . The key idea is that the number of vertices in the rejected layer of a cluster is at most . Thus the number of edges from to clusters that are created later is at most . For every edge between two clusters we let the cluster that is created first account for that edge. Summing over all these edges between clusters yields the desired upper bound of .
Using breadth-first search, the partitioning algorithm described above runs in time (recall that is assumed to be connected). This completes the proof of the lemma. â
With Lemma 7.1 in hand, we are now ready to prove Theorem 13.
Proof of Theorem 13.
Given , we first compute a -partition into clusters as described by Lemma 7.1. We define the set of contracted edges as the union of all edges within the clusters, . We thus contract each cluster into a single vertex and remove from every set of resulting parallel edges all but a single edge.
We proceed to show that is a -contraction, i.e., we show that for all . Consider two vertices and , where and might be equal. Let be the shortest path from to in with edge lengths (all edges from receive length zero). The length of is the number of edges on that path that connect different clusters. Note that enters and leaves each of the visited clusters at most once, using at most edges in every cluster, so in (where all edges have unit lengths) we get .
Combining these observations we obtain
[TABLE]
proving the claim. It remains to show that the contracted graph has at most edges, which is an immediate consequence of the upper bound given by Lemma 7.1. This completes the proof of the theorem. â
ErdĆsâ girth conjecture [Erd64] asserts that there exist graphs with edges and girth . It has been verified for [Wen91] and the strongest spanner lower bounds depend on it. We derive from the conjecture the following (not tight) lower bound.
Theorem 15**.**
Assuming ErdĆsâ girth conjecture, there exists for any integer a graph such that any -contraction results in a graph with edges.
Proof.
For a given integer let be a graph that is guaranteed by ErdĆsâ girth conjecture, i.e., has girth and edges. Consider any -contraction on , and consider a connected component of the graph . Applying (1) shows that holds for any two vertices and in that component. Using that the girth of is , it follows that for any cycle in , the connected component of does not contain a contiguous segment of cycle edges of length at least half of the cycle. This implies that all connected components of the graph are trees with diameter at most . Therefore, the total number of edges within all connected components of is at most . We will further argue that there is at most one edge between any two connected components. Suppose for the sake of contradiction that there are two components of with two different edges connecting them, say and , where and lie in the same connected component and and in the other. As the diameter of each component is at most , it follows that in there is a path from to of length at most , and a path from to of length at most . Together with the two edges connecting the components we obtain a cycle of length at most , contradicting the assumption that has girth .
Therefore, the resulting graph after the contraction has edges. â
7.2. Additive contractions
Turning to the case of a purely additive error, we obtain the following two results.
Theorem 16**.**
Let be a graph with unit length edges.
- (i)
For any even integer , the set of edges incident to the vertices of highest degrees is a -contraction in with . 2. (ii)
For any real number , the set of edges incident to two vertices of degree at least is a -contraction in such that has edges.
These contractions can be computed in time .
As mentioned in the introduction, Bernstein and Chechik analyzed the contraction of Theorem 16 (ii) in [BC16] and used it in their dynamic shortest paths algorithm, so this part is already proved.
Proof of Theorem 16 (i).
Let be the set of vertices in of highest degree. Then we have
[TABLE]
Let be the set of edges incident to any vertex in . As each edge is incident to at most two vertices in , we get from the previous inequality. As no shortest path visits a vertex in twice, is indeed a -contraction. The set can be computed as follows: We first compute the degrees of all vertices in time , then find the -th largest element in this list in time , and by another linear time sweep over this list we select vertices of highest degree. Overall, the required time is . â
This result implies that the number of edges in is at most . If is a path, no -contraction has an objective value greater than , and , showing that the objective value in Theorem 16 (i) can be improved by at most a factor of two.
The information theoretic lower bound in [AB16] implies that for all , any contraction such that has edges does not admit a constant additive error.
7.3. Vertex reduction
All of the results above show that contractions can be effectively used to reduce the number of edges in a dense graph. But one possible advantage of using a contraction instead of a spanner is that it also has the potential to reduce the number of vertices in the graph. Unfortunately, for constant approximation errors, it is not possible to guarantee more than a constant-factor reduction in general graphs: it is not hard to see that given a path on vertices, any -contraction will still result in at least vertices. The same problem applies to general dense graphs, since they could still contain a long path within them. That being said, it seems likely that in practice contraction can lead to significant vertex reduction in many dense graphs. We ground this practical intuition with the following theoretical result for the special case of graphs with large minimum degree.
Theorem 17**.**
Let be an integer. Any graph with minimum degree at least has a -contraction such that the contracted graph has at most vertices, and such a contraction can be computed in time .
Proof.
Recall the definition of an -partition. For a cluster with center vertex we refer to as the radius of that cluster. This is the maximum distance of all cluster vertices from .
We will show how to construct a -partition in which the number of clusters is at most . Using the exact same argument as in the proof of Theorem 13, such a -partition yields the desired -contraction. Our construction first builds clusters of radius 1, and then extends them to clusters of radius 2. The clustering with radius 1 proceeds very similarly as in the proof of Lemma 7.1 before with . The crucial difference is that we choose as center vertices only vertices with degree at least . If no such vertices are left, the clustering process terminates, and the remaining unclustered vertices have degree strictly less than . It is easy to see that since those vertices have degree at least in the original graph, they must be adjacent to a vertex in a radius 1 cluster. We can thus assign each of those vertices to such a cluster arbitrarily, yielding a clustering of all vertices of with radius 2.
The number of clusters is at most because by construction every cluster contains at least vertices. This shows that the number of vertices in the contracted graph is at most .
This algorithm can be implemented in time by using an adjacency list representation where we keep track of degree information after removing an edge from the graph. â
To see that we cannot guarantee less than vertices, even with larger approximation error, consider the graph that consists of isolated -cliques. We now show that even if is connected, we cannot guarantee vertices in the contracted graph, even if we allow a larger (constant) approximation error.
Theorem 18**.**
Let and be integers. There exists an infinite family of -vertex graphs with minimum degree such that any -contraction results in a graph with vertices.
Proof.
Assume for simplicity that is divisible by . We construct the graph as follows. We partition the vertices into layers, with each layer containing exactly vertices. For , all vertices in layer receive an edge to all vertices in layer . Clearly all vertices in the resulting graph have degree at least . Let and be two vertices in layers and , respectively. Then clearly we have . Now let be any -contraction on , and consider the connected components of the graph . Applying (1) shows that holds for any two vertices and in the same component. Combining these two inequalities shows that every connected component contains vertices from at most layers. As there are layers, the contracted graph has at least vertices. â
Acknowledgements
We thank Martin Skutella for stimulating discussions about the problems treated in this paper. We also thank the anonymous referees for their valuable suggestions that helped improving the presentation of results.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AB 16] A. Abboud and G. Bodwin. The 4/3 additive spanner exponent is tight. In STOCâ16âProceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing , pages 351â361. ACM, New York, 2016.
- 2[ACIM 99] D. Aingworth, C. Chekuri, P. Indyk, and R. Motwani. Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM J. Comput. , 28(4):1167â1181, 1999.
- 3[ADD + 93] I. Althöfer, G. Das, D. Dobkin, D. Joseph, and J. Soares. On sparse spanners of weighted graphs. Discrete Comput. Geom. , 9(1):81â100, 1993.
- 4[Awe 85] B. Awerbuch. Complexity of network synchronization. J. Assoc. Comput. Mach. , 32(4):804â823, 1985.
- 5[BBV 00] T. C. Biedl, B. BrejovĂĄ, and T. VinaĆ. Simplifying flow networks. In Mathematical foundations of computer science 2000 (Bratislava) , volume 1893 of Lecture Notes in Comput. Sci. , pages 192â201. Springer, Berlin, 2000.
- 6[BC 16] A. Bernstein and S. Chechik. Deterministic decremental single source shortest paths: beyond the O â ( m â n ) đ đ đ O(mn) bound. In STOCâ16âProceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing , pages 389â397. ACM, New York, 2016.
- 7[BDD + 18] A. Bernstein, K. DĂ€ubel, Y. Disser, M. Klimm, T. MĂŒtze, and F. Smolny. Distance-preserving graph contractions. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA , pages 51:1â51:14, 2018. Preprint available at ar Xiv:1705.04544 .
- 8[BKMP 05] S. Baswana, T. Kavitha, K. Mehlhorn, and S. Pettie. New constructions of ( α , ÎČ ) đŒ đœ (\alpha,\beta) -spanners and purely additive spanners. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 672â681. ACM, New York, 2005.
