Solving Vertex Cover in Polynomial Time on Hyperbolic Random Graphs
Thomas Bl\"asius, Philipp Fischbeck, Tobias Friedrich, Maximilian, Katzmann

TL;DR
This paper demonstrates that the Vertex Cover problem can be solved in polynomial time on hyperbolic random graphs, providing insights into real-world network structures and improving approximation algorithms.
Contribution
It proves polynomial-time solvability of Vertex Cover on hyperbolic random graphs and links structural properties to practical algorithm performance.
Findings
Vertex Cover solvable in polynomial time on hyperbolic random graphs
Structural properties observed in real-world networks
Adaptive greedy algorithms outperform standard approaches
Abstract
The VertexCover problem is proven to be computationally hard in different ways: It is NP-complete to find an optimal solution and even NP-hard to find an approximation with reasonable factors. In contrast, recent experiments suggest that on many real-world networks the run time to solve VertexCover is way smaller than even the best known FPT-approaches can explain. Similarly, greedy algorithms deliver very good approximations to the optimal solution in practice. We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover…
| network | easy | dom | tw | greedy | 2-ad | 4-ad | comp |
|---|---|---|---|---|---|---|---|
| advogato | ✓ | 314 | 1.011 | 1.009 | 1.005 | 863 | |
| airlines | ✓ | 23 | 1.000 | 1.000 | 1.000 | 75 | |
| as-22july06 | ✓ | 3 | 1.002 | 1.001 | 1.001 | 46 | |
| as-caida20071105 | ✓ | 3 | 1.002 | 1.001 | 1.000 | 35 | |
| as-skitter | ✗ | 969794 | |||||
| as20000102 | ✓ | 2 | 1.003 | 1.001 | 1.001 | 18 | |
| bio-CE-HT | ✓ | 3 | 1.015 | 1.009 | 1.000 | 225 | |
| bio-CE-LC | ✓ | 2 | 1.003 | 1.003 | 1.003 | 39 | |
| bio-DM-HT | ✓ | 13 | 1.017 | 1.014 | 1.004 | 319 | |
| bio-yeast-protein-inter | ✓ | 4 | 1.013 | 1.006 | 1.002 | 147 | |
| bn-fly-drosophila-medulla-1 | ✓ | 38 | 1.018 | 1.013 | 1.009 | 142 | |
| bn-mouse-kasthuri-graph-v4 | ✓ | 1 | 1.006 | 1.000 | 1.000 | 12 | |
| ca-AstroPh | ✓ | 6 | 1.003 | 1.002 | 1.001 | 123 | |
| ca-cit-HepPh | ✓ | 151 | 1.003 | 1.003 | 1.002 | 533 | |
| ca-CondMat | ✓ | 4 | 1.003 | 1.002 | 1.001 | 53 | |
| ca-GrQc | ✓ | 2 | 1.004 | 1.002 | 1.001 | 44 | |
| ca-HepTh | ✓ | 13 | 1.005 | 1.004 | 1.001 | 174 | |
| cfinder-google | ✗ | 82 | |||||
| cit-HepTh | ✗ | 19737 | |||||
| citeseer | ✗ | 182372 | |||||
| com-amazon | ✓ | 2756 | 1.011 | 1.006 | 1.002 | 16209 | |
| com-dblp | ✓ | 7 | 1.002 | 1.001 | 1.000 | 69 | |
| cpan-authors | ✓ | 2 | 1.009 | 1.009 | 1.009 | 17 | |
| digg-friends | ✓ | 1649 | 1.008 | 1.006 | 1.004 | 179 | |
| ego-facebook | ✓ | -1 | 1.000 | 1.000 | 1.000 | 3 | |
| ego-gplus | ✓ | 1 | 1.000 | 1.000 | 1.000 | 5 | |
| email-Enron | ✓ | 41 | 1.003 | 1.002 | 1.001 | 141 | |
| EuroSiS | ✓ | 34 | 1.020 | 1.018 | 1.010 | 274 | |
| facebook-wosn-links | ✗ | 36694 | |||||
| flixster | ✗ | 122 | |||||
| hyves | ✓ | 1653 | 1.008 | 1.008 | 1.008 | 42 | |
| livemocha | ✓ | 24380 | 1.017 | 1.013 | 1.006 | 25300 | |
| loc-brightkite-edges | ✓ | 619 | 1.014 | 1.009 | 1.004 | 4658 |
| network | easy | dom | tw | greedy | 2-ad | 4-ad | comp |
|---|---|---|---|---|---|---|---|
| loc-gowalla-edges | ✗ | 3991 | |||||
| moreno-names | ✓ | 3 | 1.006 | 1.004 | 1.002 | 34 | |
| moreno-propro | ✓ | 4 | 1.014 | 1.006 | 1.002 | 153 | |
| munmun-twitter-social | ✓ | 12 | 1.000 | 1.000 | 1.000 | 5 | |
| OClinks | ✓ | 202 | 1.017 | 1.015 | 1.005 | 498 | |
| p2p-Gnutella04 | ✓ | 1352 | 1.019 | 1.017 | 1.016 | 970 | |
| p2p-Gnutella05 | ✓ | 1075 | 1.014 | 1.013 | 1.013 | 447 | |
| p2p-Gnutella06 | ✓ | 1142 | 1.023 | 1.022 | 1.021 | 820 | |
| p2p-Gnutella08 | ✓ | 414 | 1.008 | 1.008 | 1.008 | 45 | |
| p2p-Gnutella09 | ✓ | 419 | 1.005 | 1.005 | 1.005 | 63 | |
| p2p-Gnutella24 | ✓ | 525 | 1.006 | 1.005 | 1.005 | 70 | |
| p2p-Gnutella25 | ✓ | 464 | 1.006 | 1.005 | 1.005 | 77 | |
| p2p-Gnutella30 | ✓ | 604 | 1.005 | 1.005 | 1.004 | 62 | |
| p2p-Gnutella31 | ✓ | 732 | 1.011 | 1.010 | 1.010 | 65 | |
| petster-carnivore | ✓ | 149312 | 1.008 | 1.007 | 1.004 | 9238 | |
| petster-friendship-cat | ✗ | 14929 | |||||
| petster-friendship-dog | ✗ | 340634 | |||||
| petster-friendship-hamster | ✗ | 135 | |||||
| soc-Epinions1 | ✓ | 238 | 1.006 | 1.003 | 1.001 | 228 | |
| US-Air | ✓ | 4 | 1.013 | 1.000 | 1.000 | 23 | |
| web-Google | ✗ | 103939 | |||||
| wiki-Vote | ✓ | 384 | 1.054 | 1.052 | 1.050 | 726 | |
| wordnet-words | ✓ | 28 | 1.004 | 1.003 | 1.002 | 59 | |
| YeastS | ✓ | 39 | 1.013 | 1.012 | 1.005 | 244 | |
| youtube-links | ✓ | 1239 | 1.008 | 1.004 | 1.001 | 570 | |
| youtube-u-growth | ✗ | 59358 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Hasso Plattner Institute, University of Potsdam
Potsdam, [email protected] Plattner Institute, University of Potsdam
Potsdam, [email protected] Plattner Institute, University of Potsdam
Potsdam, [email protected]://orcid.org/0000-0003-0076-6308Hasso Plattner Institute, University of Potsdam
Potsdam, [email protected]\CopyrightThomas Bläsius, Philipp Fischbeck, Tobias Friedrich, Maximilian Katzmann\ccsdesc[500]Theory of computation Graph algorithms analysis \ccsdesc[500]Theory of computation Random network models \ccsdesc[500]Mathematics of computing Random graphs \fundingThis research was partially funded by the German Research Foundation (Deutsche
Forschungsgemeinschaft, DFG) – project number 390859508.\hideLIPIcs\EventEditorsChristophe Paul and Markus Bläser \EventNoEds2 \EventLongTitle37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020) \EventShortTitleSTACS 2020 \EventAcronymSTACS \EventYear2020 \EventDateMarch 10–13, 2020 \EventLocationMontpellier, France \EventLogo \SeriesVolume154 \ArticleNo21
Solving Vertex Cover in Polynomial Time on Hyperbolic Random
Graphs
Thomas Bläsius
Philipp Fischbeck
Tobias Friedrich
Maximilian Katzmann
Abstract
The VertexCover problem is proven to be computationally hard in different ways: It is NP-complete to find an optimal solution and even NP-hard to find an approximation with reasonable factors. In contrast, recent experiments suggest that on many real-world networks the run time to solve VertexCover is way smaller than even the best known FPT-approaches can explain. Similarly, greedy algorithms deliver very good approximations to the optimal solution in practice.
We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover problem on hyperbolic random graphs can be solved in polynomial time, with high probability.
The proof relies on interesting structural properties of hyperbolic random graphs. Since these predictions of the model are interesting in their own right, we conducted experiments on real-world networks showing that these properties are also observed in practice. When utilizing the same structural properties in an adaptive greedy algorithm, further experiments suggest that, on real instances, this leads to better approximations than the standard greedy approach within reasonable time.
keywords:
vertex cover, random graphs, hyperbolic geometry, efficient algorithm
1 Introduction
VertexCover is a fundamental NP-complete graph problem. For a given undirected graph on vertices the goal is to find the smallest vertex subset , such that each edge in is incident to at least one vertex in . Since, by definition, there can be no edge between two vertices outside of , these remaining vertices form an independent set. Therefore, one can easily derive a maximal independent set from a minimal vertex cover and vice versa.
Due to its NP-completeness there is probably no polynomial time algorithm for solving VertexCover. The best known algorithm for IndependentSet runs in [22]. To analyze the complexity of VertexCover on a finer scale, several parameterized solutions have been proposed. One can determine whether a graph has a vertex cover of size by applying a branch-and-reduce algorithm. The idea is to build a search tree by recursively considering two possible extensions of the current vertex cover (branching), until a vertex cover is found or the size of the current cover exceeds . Each branching step is followed by a reduce step in which reduction rules are applied to make the considered graph smaller. This branch-and-reduce technique yields a simple algorithm, where the exponential portion comes from the branching. The best known FPT (fixed-parameter tractable) algorithm runs in time [7], and unless ETH (exponential time hypothesis) fails, there can be no algorithm [6].
While these FPT approaches promise relatively small running times if the considered network has a small vertex cover, the cover is large for many real-world networks. Nevertheless, it was recently observed that applying a branch-and-reduce technique on real instances is very efficient [1]. Some of the considered networks had millions of vertices, yet an optimal solution (also containing millions of vertices) was computed within seconds. Most instances were solved so quickly since the expensive branching was not necessary at all. In fact, the application of the reduction rules alone already yielded an optimal solution. Most notably, applying the dominance reduction rule, which eliminates vertices whose neighborhood contains a vertex together with its neighborhood, reduces the graph to a very small remainder on which the branching, if necessary, can be done quickly. We trace the effectiveness of the dominance rule back to two properties that are often observed in real-world networks: a heterogeneous degree distribution (the network contains many vertices of small degree and few vertices of high degree) and high clustering (the neighbors of a vertex are likely to be neighbors themselves).
We formalize these key properties using hyperbolic random graphs to analyze the performance of the dominance rule. Introduced by Krioukov et al. [17], hyperbolic random graphs are obtained by randomly distributing nodes in the hyperbolic plane and connecting any two that are geometrically close. The resulting graphs feature a power-law degree distribution and high clustering [14, 17] (the two desired properties) which can be tuned using parameters of the model. Additionally, the generated networks have a small diameter [13]. All of these properties have been observed in many real-world networks such as the internet, social networks, as well as biological networks like protein-protein interaction networks. Furthermore, Boguná, Papadopoulos, and Krioukov showed that the internet can be embedded into the hyperbolic plane such that routing packages between network participants greedily works very well [5], indicating that this network naturally fits into the hyperbolic space.
By making use of the underlying geometry, we show that VertexCover can be solved in polynomial time on hyperbolic random graphs, with high probability. This is done by showing that even a single application of the dominance reduction rule reduces a hyperbolic random graph to a remainder with small pathwidth on which VertexCover can then be solved efficiently. Our analysis provides an explanation for why VertexCover can be solved efficiently on practical instances. We note that, while our analysis makes use of the underlying hyperbolic geometry, the algorithm itself is oblivious to it. Besides the running time the model predicts certain structural properties that also point us to an adapted greedy algorithm that is still very efficient and achieves better approximation ratios. We conducted experiments indicating that these predictions (concerning the structural properties and improved approximation) actually match the real world for a significant fraction of networks.
2 Preliminaries
Let be an undirected graph. We denote the number of vertices in with . The neighborhood of a vertex is defined as and the size of the neighborhood, called the degree of , is denoted by . For a subset , we use to denote the induced subgraph of obtained by removing all vertices in . Furthermore, we use the shorthand notation to denote .
The Hyperbolic Plane.
After choosing a designated origin in the two-dimensional hyperbolic plane, together with a reference ray starting at , a point is uniquely identified by its radius , denoting the hyperbolic distance to , and its angle (or angular coordinate) , denoting the angular distance between the reference ray and the line through and . The hyperbolic distance between two points and is given by
[TABLE]
where , (both growing as ), and denotes the angular distance between and . If not stated otherwise, we assume that computations on angles are performed modulo .
We use to denote a disk of radius centered at , i.e., the set of points with hyperbolic distance at most to . Such a disk has an area of and circumference . Thus, the area and the circumference of a disk in the hyperbolic plane grow exponentially with its radius. In contrast, this growth is polynomial in Euclidean space. Therefore, representing hyperbolic shapes in the Euclidean geometry results in a distortion. In the native representation, used in our figures, circles can appear teardrop-shaped (see Figure 2).
Hyperbolic Random Graphs.
Hyperbolic random graphs are obtained by distributing points uniformly at random within the disk and connecting any two of them if and only if their hyperbolic distance is at most ; see Figure 1. The disk radius (which matches the connection threshold) is defined as , where is a constant describing the desired average degree of the generated network. The coordinates for the vertices are drawn as follows. For vertex the angular coordinate, denoted by , is drawn uniformly at random from and the radius of , denoted by , is sampled according to the probability density function for and . Thus,
[TABLE]
is their joint distribution function for . For , . The constant is used to tune the power-law exponent of the degree distribution of the generated network. Note that we obtain power-law exponents . Exponents outside of this range are atypical for hyperbolic random graphs. On the one hand, for the average degree of the generated networks is divergent. On the other hand, for hyperbolic random graphs degenerate: They decompose into smaller components, none having a size linear in . The obtained graphs have logarithmic tree width [4], meaning the VertexCover problem can be solved efficiently in that case.
The probability for a given vertex to lie in a certain area of the disk is given by its probability measure . The hyperbolic distance between two vertices and increases with increasing angular distance between them. The maximum angular distance such that they are still connected by an edge is bounded by [14, Lemma 6]
[TABLE]
Interval Graphs and Circular Arc Graphs.
In an interval graph each vertex is identified with an interval on the real line and two vertices are adjacent if and only if their intervals intersect. The interval width of an interval graph , denoted by , is its maximum clique size, i.e., the maximum number of intervals that intersect in one point. For any graph the interval width is defined as the minimum interval width over all of its interval supergraphs. Circular arc graphs are a superclass of interval graphs, where each vertex is identified with a subinterval of the circle called circular arc or simply arc. The interval width of a circular arc graph is at most twice the size of its maximum clique, since one obtains an interval supergraph of by mapping the circular arcs into the interval on the real line and replacing all intervals that were split by this mapping with the whole interval . Consequently, for any graph , if denotes the minimum over the maximum clique number of all circular arc supergraphs of , then the interval width of is at most .
Treewidth and Pathwidth.
A tree decomposition of a graph is a tree where each tree node represents a subset of the vertices of called bag, and the following requirements have to be satisfied: Each vertex in is contained in at least one bag, all bags containing a given vertex in form a connected subtree of , and for each edge in , there exists a bag containing both endpoints. The width of a tree decomposition is the size of its largest bag minus one. The treewidth of is the minimum width over all tree decompositions of . The path decomposition of a graph is defined analogously to the tree decomposition, with the constraint that the tree has to be a path. Additionally, as for the treewidth, the pathwidth of a graph , denoted by , is the minimum width over all path decompositions of . Clearly the pathwidth is an upper bound on the treewidth. It is known that for any graph and any , the interval width of is at most if and only if its pathwidth is at most [8, Theorem 7.14]. Consequently, if is the maximum clique size of a circular arc supergraph of , then is an upper bound on the pathwidth of .
Probabilities.
Since we are analyzing a random graph model, our results are of probabilistic nature. To obtain meaningful statements, we show that they hold with high probability (for short whp.), i.e., with probability . The following Chernoff bound is a useful tool for showing that certain events occur with high probability.
Theorem 2.1** (Chernoff Bound [11, A.1]).**
Let be independent random variables with and let be their sum. Let . If is an upper bound for , then for each constant there exists a constant such that holds with probability .
3 Vertex Cover on Hyperbolic Random Graphs
Reduction rules are often applied as a preprocessing step, before using a brute force search or branching in a search tree. They simplify the input by removing parts that are easy to solve. For example, an isolated vertex does not cover any edges and can thus never be part of a minimum vertex cover. Consequently, in a preprocessing step all isolated vertices can be removed, which leads to a reduced input size without impeding the search for a minimum.
The dominance reduction rule was previously defined for the IndependentSet problem [12], and later used for VertexCover in the experiments by Akiba and Iwata [1]. Formally, vertex dominates a neighbor if , i.e., all neighbors of are also neighbors of . We say is dominant if it dominates at least one vertex. The dominance rule states that can be added to the vertex cover (and afterwards removed from the graph), without impeding the search for a minimum vertex cover. To see that this is correct, assume that dominates and let be a minimum vertex cover that does not contain . Since has to cover all edges, it contains all neighbors of . These neighbors include and all of ’s neighbors, since dominates . Therefore, removing from leaves only the edge uncovered which can be fixed by adding instead. The resulting vertex cover has the same size as . When searching for a minimum vertex cover of , it is thus safe to assume that is part of the solution and to reduce the search to .
In the remainder of this section, we study the effectiveness of the dominance reduction rule on hyperbolic random graphs and conclude that VertexCover can be solved efficiently on these graphs. Our results are summarized in the following main theorem.
Theorem 3.1**.**
Let be a hyperbolic random graph on vertices. Then the VertexCover problem on can be solved in time, with high probability.
The proof of Theorem 3.1 consists of two parts that make use of the underlying hyperbolic geometry. In the first part, we show that applying the dominance reduction rule once removes all vertices in the inner part of the hyperbolic disk with high probability, as depicted in Figure 1. We note that this is independent of the order in which the reduction rule is applied, as dominant vertices remain dominant after removing other dominant vertices. In the second part, we consider the induced subgraph containing the remaining vertices near the boundary of the disk (black vertices in Figure 1). We prove that this subgraph has a small pathwidth, by showing that there is a circular arc supergraph with a small interval width. Consequently, a tree decomposition of this subgraph can be computed efficiently. Finally, we obtain a polynomial time algorithm for VertexCover by first applying the reduction rules and afterwards solving VertexCover on the remaining subgraph using dynamic programming on the tree decomposition of small width.
3.1 Dominance on Hyperbolic Random Graphs
Recall that a hyperbolic random graph is obtained by distributing vertices in a hyperbolic disk and that any two are connected if their distance is at most . Consequently, one can imagine the neighborhood of a vertex as another disk . Vertex dominates another vertex if its neighborhood disk completely contains that of (both constrained to ), as depicted in Figure 2 left. We define the dominance area of to be the area containing all such vertices . That is, . The result is illustrated in Figure 2 right. We note that it is sufficient for a vertex to lie in in order to be dominated by , however, it is not necessary.
Given the radius of vertex we can now compute a lower bound on the probability that dominates another vertex, i.e., the probability that at least one vertex lies in , by determining the measure . To this end, we first define to be the maximum angular distance between two nodes and such that lies in .
Lemma 3.2**.**
Let be vertices with . Then, if is at most
[TABLE]
Proof 3.3**.**
Without loss of generality we assume that . For now assume that . Since we know that the intersections of the boundaries of with lie between those of with , as is depicted in Figure 3. Now let denote one of these intersections for and , and let denote the intersection for and that is on the same side of the ray through and as . It is easy to see that the maximum angular distance between and such that is contained within is given by the angular distance between and . Therefore, lies in the dominance area of if .
Recall that denotes the maximum angular distance such that , as defined in Equation (2). Since and have radius and hyperbolic distance to and , respectively, we know that their angular coordinates are and , respectively. Consequently, the angular distance between and is given by
[TABLE]
Using Lemma 3.2 we can now compute the probability for a given vertex to lie in the dominance area of . We note that this probability grows roughly like , which is a constant fraction of the measure of the neighborhood disk of which grows as [14, Lemma 3.2]. Consequently, the expected number of nodes that dominates is a constant fraction of the expected number of its neighbors.
Lemma 3.4**.**
Let be a node with radius . The probability for a given node to lie in is given by
[TABLE]
Proof 3.5**.**
The probability for a given vertex to lie in is obtained by integrating the probability density (given by Equation (1)) over .
[TABLE]
Since and we have and . Due to the linearity of integration, constant factors within the integrand can be moved out of the integral, which yields
[TABLE]
The remaining integrals can be computed easily and we obtain
[TABLE]
As and , simplifying the error terms yields the claim.
The following lemma shows that, with high probability, all vertices that are not too close to the boundary of the disk dominate at least one vertex.
Lemma 3.6**.**
Let be a hyperbolic random graph with average degree . Then there is a constant , such that all vertices with are dominant, with high probability.
Proof 3.7**.**
Vertex is dominant if at least one vertex lies in . To show this for any with , it suffices to show it for , since increases with decreasing radius. To determine the probability that at least one vertex lies in , we use Lemma 3.4 and obtain
[TABLE]
By substituting , we obtain . The probability of at least one node falling into is now given by
[TABLE]
Consequently, for large enough we can choose such that the probability of a vertex at radius being dominant is at least , allowing us to apply union bound.
Corollary 3.8**.**
Let be a hyperbolic random graph and . With high probability, all vertices with radius at most are removed by the dominance rule.
By Corollary 3.8 the dominance rule removes all vertices of radius at most . Consequently, all remaining vertices have radius at least . We refer to this part of the disk as outer band. More precisely, the outer band is defined as . It remains to show that the pathwidth of the subgraph induced by the vertices in the outer band is small.
3.2 Pathwidth in the Outer Band
In the following, we use to denote the induced subgraph of that contains all vertices with radius at least . To show that the pathwidth of (the induced subgraph in the outer band) is small, we first show that there is a circular arc supergraph of with a small maximum clique. We use to denote a circular arc supergraph of a hyperbolic random graph , which is obtained by assigning each vertex an angular interval on the circle, such that the intervals of two adjacent vertices intersect. More precisely, for a vertex , we set . Intuitively, this means that the interval of a vertex contains a superset of all its neighbors that have a larger radius, as can be seen in Figure 4 left. The following lemma shows that is actually a supergraph of .
Lemma 3.9**.**
Let be a hyperbolic random graph. Then is a supergraph of .
Proof 3.10**.**
Let be any edge in . To show that is a supergraph of we need to show that and are also adjacent in , i.e., . Without loss of generality assume . Since and are adjacent in , the hyperbolic distance between them is at most . It follows, that their angular distance is bounded by . Since for , we have . As extends by from in both directions, it follows that .
It is easy to see that, after removing a vertex from and , is still a supergraph of . Consequently, is a supergraph of . It remains to show that has a small maximum clique number, which is given by the maximum number of arcs that intersect at any angle. To this end, we first compute the number of arcs that intersect a given angle which we set to [math] without loss of generality. Let denote the area of the disk containing all vertices with radius whose interval intersects [math], as illustrated in Figure 4 right. The following lemma describes the probability for a given vertex to lie in .
Lemma 3.11**.**
Let be a hyperbolic random graph and let . The probability for a given vertex to lie in is bounded by
[TABLE]
Proof 3.12**.**
We obtain the measure of by integrating the probability density function over . Due to the definition of we can conclude that includes all vertices with radius whose angular distance to [math] is at most , defined in Equation (2). We obtain,
[TABLE]
As before, we can conclude that , since . By moving constant factors out of the integral, the expression can be simplified to
[TABLE]
We split the sum in the integral and deal with the two resulting integrals separately.
[TABLE]
By placing outside of the brackets we obtain
[TABLE]
Simplifying the remaining error terms then yields the claim.
We can now bound the maximum clique number in and thus its interval width .
Theorem 3.13**.**
Let be a hyperbolic random graph and . Then there exists a constant such that, whp., if , and otherwise
[TABLE]
Proof 3.14**.**
We start by determining the expected number of arcs that intersect at a given angle, which can be done by computing the expected number of vertices in , using Lemma 3.11:
[TABLE]
It remains to show that this bound holds with high probability at every angle. To this end, we make use of a Chernoff bound (Theorem 2.1), by first showing that the bound on is . We start with the case where .
[TABLE]
Substituting we obtain
[TABLE]
Thus, for all radii smaller than , the resulting upper bound is lower bounded by , which lets us apply Theorem 2.1. Moreover, as decreases with increasing , is a pessimistic but valid upper bound for the case . Thus, we can also apply Theorem 2.1 to this case, using the bound.
By Theorem 2.1, we can choose such that in both cases the bound holds with probability for any at a given angle. In order to see that it holds at every angle, note that it suffices to show that it holds at all arc endings as the number of intersecting arcs does not change in between arc endings. Since there are exactly arc endings, we can apply union bound and obtain that the bound holds with probability for any at every angle. Since our bound on is an upper bound on the maximum clique size of , the interval width of is at most twice as large, as argued in Section 2.
Since the interval width of a circular arc supergraph of is an upper bound on the pathwidth of [8, Theorem 7.14] and since for , we immediately obtain the following corollary.
Corollary 3.15**.**
Let G be a hyperbolic random graph and let be the subgraph obtained by removing all vertices with radius at most . Then, .
We are now ready to prove our main theorem, which we restate for the sake of readability.
Theorem 3.1**.**
Let be a hyperbolic random graph on vertices. Then the VertexCover problem in can be solved in time, with high probability.
Proof 3.2**.**
Consider the following algorithm that finds the minimum vertex cover of . We start with an empty vertex cover . Initially, all dominant vertices are added to , which is correct due to the dominance rule. By Lemma 3.6, this includes all vertices of radius at most , for some constant , with high probability. Obviously, finding all vertices that are dominant can be done in time. It remains to determine a vertex cover of . By Corollary 3.15, the pathwidth of is , with high probability. Since the pathwidth is an upper bound on the treewidth, we can find a tree decomposition of and solve the VertexCover problem in in time [8, Theorems 7.18 and 7.9].
Moreover, linking the radius of a vertex in Theorem 3.13 with its expected degree leads to the following corollary, which is interesting in its own right. It links the pathwidth to the degree in the graph . Recall that denotes the subgraph of induced by the vertices of degree at most .
Corollary 3.16**.**
Let be a hyperbolic random graph and let . Then, with high probability, .
Proof 3.17**.**
Consider the radius for some constant , and the graph which is obtained by removing all vertices of radius at most . By substituting and using [14, Lemma 3.2] we can compute the expected degree of a vertex with radius as
[TABLE]
First assume that . We handle the other case later. Since we can choose large enough to apply Theorem 2.1 and conclude that this holds with high probability. Furthermore, since a smaller radius implies a larger degree, we know that, with high probability, all nodes with radius at most , have
[TABLE]
For large enough we can choose such that, with high probability, is a supergraph of . To prove the claim, it remains to bound the pathwidth of . If , we can apply the first part of Theorem 3.13 to obtain . Otherwise, we use part two to conclude that the interval width of is at most
[TABLE]
As argued in Section 2 the interval width of a graph is an upper bound on the pathwidth.
For (which we excluded above), consider for . As we already proved the corollary for , we obtain . As is a subgraph of , the same bound holds for .
4 Discussion
Our results show that a heterogeneous degree distribution as well as high clustering make the dominance rule very effective. This matches the behavior for real-world networks, which typically exhibit these two properties. However, our analysis actually makes more specific predictions: (I) vertices with sufficiently high degree usually have at least one neighbor they dominate and can thus safely be included in the vertex cover; and (II) the graph remaining after deleting the high degree vertices has simple structure, i.e., small pathwidth.
To see whether this matches the real world, we run experiments on networks from several network datasets [2, 3, 18, 19, 20]. Although the focus of this paper is the theoretical analysis on hyperbolic random graphs, we briefly report on our experimental results; see Table LABEL:tab:data in Appendix 5. Out of the instances, we can solve VertexCover for networks in reasonable time. We refer to these as easy, while the remaining are called hard. Note that our theoretical analysis aims at explaining why the easy instances are easy.
Recall from Lemma 3.6 that all vertices with radius at most probably dominate, which corresponds to an expected degree of . For more than half of the networks, more than of the vertices above this degree were in fact dominant. For more than a quarter of the networks, more than were dominant. Restricted to the easy instances, these number increase to and , respectively.
Experiments concerning the pathwidth of the resulting graph are much more difficult, due to the lack of efficient tools. Therefore, we used the tool by Tamaki et al. [21] to heuristically compute upper bounds on the treewidth instead. As in our analysis, we only removed vertices that dominate in the original graph instead of applying the reduction rule exhaustively. On the resulting subgraphs, the treewidth heuristic ran with a timeout. The resulting treewidth is at most for of the networks, at most for , and at most for . Restricted to easy instances, the values increase to , , and , respectively.
Hyperbolic random graphs are of course an idealized representation of real-world networks. However, these experiments indicate that the predictions derived from the model match the real world, at least for a significant fraction of networks.
Approximation.
Concerning approximation algorithms for VertexCover, there is a similar theory-practice gap as for exact solutions. In theory, there is a simple 2-approximation and the best known polynomial time approximation reduces the factor to [15]. However, it is NP-hard to approximate VertexCover within a factor of [10], and presumably it is even NP-hard to approximate within a factor of for all [16]. Moreover, the greedy strategy that iteratively adds the vertex with maximum degree to the vertex cover and deletes it, is only a approximation. However, on scale-free networks this strategy performs exceptionally well with approximation ratios very close to 1 [9].
Our results for hyperbolic random graphs at least partially explain this good approximation ratio. Lemma 3.6 states that, with high probability, we do not make any mistake by taking all vertices below a certain radius , which corresponds to vertices of at least logarithmic degree. The same computation for larger values of does no longer give such strong guarantees. However, it still gives bounds on the probability for making a mistake. In fact, this error probability is sub-constant as long as the corresponding expected degree is super-constant.
Although this is not a formal argument, it still explains to a degree why greedy works so well on networks with a heterogeneous degree distribution and high clustering. Moreover, it indicates how the greedy algorithm should be adapted to obtain better approximation ratios: As the probability to make a mistake grows with growing radius and thus with shrinking vertex degree, the majority of mistakes are done when all vertices have already low degree. However, for hyperbolic random graphs, the subgraphs induced by vertices below a certain constant degree decompose into small components for . It thus seems to be a good idea to run the greedy algorithm only until all remaining vertices have low degree, say . The remaining small connected components of maximum-degree can then be solved with brute force in reasonable time. In the following we call the resulting algorithm -adaptive greedy.
We ran experiments on the easy real networks mentioned above (for the hard instances, we cannot measure approximation ratios). For these networks, we compare the normal greedy algorithm with 2- and 4-adaptive greedy. Note that 2-adaptive greedy is special, as VertexCover can be solved efficiently on graphs with maximum degree 2 (no brute-forcing is necessary). For 4-adaptive greedy, the size of the largest connected component is relevant.
The median approximation ratio for greedy over all networks is . This goes down to for 2-adaptive and to for 4-adaptive greedy. Thus, the number of too many selected vertices goes down by a factor of and , respectively. As mentioned above, the size of the largest connected component is relevant for 4-adaptive greedy. For of the networks, this was below (which is still a reasonable size for a brute-force algorithm). Restricted to these networks, normal greedy has a median approximation ratio of , while 4-adaptive again improves by a factor of 4 to . Moreover, the number of networks for which we actually obtain the optimal solution increases from to .
5 Experimental Data
Table LABEL:tab:data (continuing on the next page) shows the raw data of our experiments for which we reported aggregate values in the discussion in Section 4. The percentage of dominant vertices among those with high degree (over ) is rounded to whole percentages. The approximation ratios are rounded to three decimal digits. Treewidth indicates that remaining graph after removing all dominant vertices contained no edge.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Takuya Akiba and Yoichi Iwata. Branch-and-reduce exponential/FPT algorithms in practice: A case study of vertex cover. Theor. Comput. Sci. , 609:211 – 225, 2016. doi:10.1016/j.tcs.2015.09.023 . · doi ↗
- 2[2] Alexandre Arenas, Albert-László Barabási, Vladimir Batagelj, Andrej Mrvar, Mark Newman, and Tore Opsahl. Gephi datasets. https://github.com/gephi/gephi/wiki/Datasets .
- 3[3] Vladimir Batagelj and Andrej Mrvar. Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/ , 2006.
- 4[4] Thomas Bläsius, Tobias Friedrich, and Anton Krohmer. Hyperbolic Random Graphs: Separators and Treewidth. In 24th Annual European Symposium on Algorithms (ESA 2016) , pages 15:1 – 15:16, 2016. doi:10.4230/LIP Ics.ESA.2016.15 . · doi ↗
- 5[5] Marián Boguná, Fragkiskos Papadopoulos, and Dmitri Krioukov. Sustaining the internet with hyperbolic mapping. Nat. Commun. , 1:62, 2010. doi:10.1038/ncomms 1063 . · doi ↗
- 6[6] Liming Cai and David Juedes. On the existence of subexponential parameterized algorithms. J. Comput. Syst. Sci. , 67:789 – 807, 2003. doi:10.1016/S 0022-0000(03)00074-6 . · doi ↗
- 7[7] Jianer Chen, Iyad A. Kanj, and Ge Xia. Improved upper bounds for vertex cover. Theor. Comput. Sci. , 411(40):3736 – 3756, 2010. doi:10.1016/j.tcs.2010.06.026 . · doi ↗
- 8[8] Marek Cygan, Fedor V. Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized Algorithms . Springer, 2015.
