Fast uniform generation of random graphs with given degree sequences
Andrii Arman, Pu Gao, Nicholas Wormald

TL;DR
This paper introduces a highly efficient algorithm for uniformly generating random graphs with specified degree sequences, significantly improving upon previous methods in terms of speed and applicability to various degree distributions.
Contribution
The authors present a novel algorithm that achieves expected linear time complexity for generating graphs with given degree sequences under certain conditions, advancing the state of the art.
Findings
Expected runtime is $O(m)$ for graphs with $ ext{max degree}^4=O(m)$.
Algorithm outperforms previous $O(m^2 ext{max degree}^2)$ methods.
Effective for power-law and $d$-regular degree sequences, reducing computational complexity.
Abstract
In this paper we provide an algorithm that generates a graph with given degree sequence uniformly at random. Provided that , where is the maximal degree and is the number of edges,the algorithm runs in expected time . Our algorithm significantly improves the previously most efficient uniform sampler, which runs in expected time for the same family of degree sequences. Our method uses a novel ingredient which progressively relaxes restrictions on an object being generated uniformly at random, and we use this to give fast algorithms for uniform sampling of graphs with other degree sequences as well. Using the same method, we also obtain algorithms with expected run time which is (i) linear for power-law degree sequences in cases where the previous best was , and (ii) for -regular graphs when ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Fast uniform generation of random graphs with given degree sequences111An extended
abstract of this paper appeared in the proceeding of FOCS2019
Andrii Arman
Pu Gao
Nicholas Wormald
Andrii Arman
School of Mathematics
Monash University
Pu Gao
Department of Combinatorics and Optimization
University of Waterloo
[email protected] Research supported by ARC DP160100835 and NSERC.
Nicholas Wormald
School of Mathematics
Monash University
[email protected] Research supported by ARC DP160100835.
Abstract
In this paper we provide an algorithm that generates a graph with given degree sequence uniformly at random. Provided that , where is the maximal degree and is the number of edges, the algorithm runs in expected time . Our algorithm significantly improves the previously most efficient uniform sampler, which runs in expected time for the same family of degree sequences. Our method uses a novel ingredient which progressively relaxes restrictions on an object being generated uniformly at random, and we use this to give fast algorithms for uniform sampling of graphs with other degree sequences as well. Using the same method, we also obtain algorithms with expected run time which is (i) linear for power-law degree sequences in cases where the previous best was , and (ii) for -regular graphs when , where the previous best was .
**Keywords: ** randomised generation algorithms, random graphs, rejection sampling
1 Introduction
Sampling discrete objects from a specified probability distribution is a classical problem in computer science, both in theory and for practical applications. Uniform generation of random graphs with a specified degree sequence is one such problem that has frequently been studied. In this paper we consider only the task of generating simple graphs, i.e. graphs with no loops or multiple edges. An early algorithm was given by Tinhofer [tinhofer79], but with unknown run time. A simple rejection-based uniform generation algorithm is usually implicit for asymptotically enumerating graphs with a specified degree sequence, for example in the papers of Békéssy, Békéssy and Komlós [bekessy1972], Bender and Canfield [bender1978] and Bollobás [bollobas1980]. The run time of this algorithm is linear in but exponential in the square of the average degree. Hence it only works in practice when degrees are small.
A big increase in the permitted degrees of the vertices was achieved by McKay and Wormald [mckay90], and around the same time Jerrum and Sinclair [jerrum90] found an approximately uniform sampler using Markov Chain Monte Carlo (MCMC) methods. McKay and Wormald used the configuration model introduced in [bollobas1980] to generate a random (but not uniformly random) multigraph with a given degree sequence. Instead of repeatedly rejecting until finding a simple graph, McKay and Wormald used a switching operation to switch away multiple edges, reaching a simple graph in the end. The algorithm is rather efficient when the degrees are not too large. In particular, for -regular graphs it runs in expected time when . (Here and in the following we assume is the number of vertices.) Jerrum and Sinclair’s Markov chain mixes in time polynomial in provided that the degree sequence satisfies a condition phrased in terms of the numbers of graphs of given degree sequences. In particular, the mixing time is polynomial in the -regular case for any function . These two benchmark research papers led the study into two different research lines. More switching-based algorithms for exactly uniform generation were given which deal with new degree sequences permitting vertices of higher degrees. The regular case was treated by Gao and Wormald [gao17] for with time complexity again , and very non-regular but still quite sparse degree sequences (such as power law) [gao18] were considered by the same authors. Various MCMC-based algorithms have been investigated for generating the graphs with distribution that is only approximately uniform, e.g. algorithms by Cooper, Dyer and Greenhill [cooper07], Greenhill [greenhill14], Kannan, Tetali and Vempala [kannan99]. These algorithms can cope with a much bigger family of degree sequences than the switching-based algorithms. That these do not produce the exactly uniform distribution might be irrelevant for practical purposes, if it were not for the fact that the theoretically provable mixing bounds are too big. For instance, the mixing time was bounded by in [cooper07] in the regular case. We note that there have also been switching-based approximate samplers that run fast (in linear or sub-quadratic time), for instance see paper of Bayati, Kim and Saberi [bayati10], Kim and Vu [kim03], Steger and Wormald [steger99] and Zhao [zhao13]. For those algorithms, the bounds on error in the output distribution are functions of which tend to 0 as grows, but cannot be reduced for any particular by running the algorithm longer. In this way they differ from the MCMC-based algorithms, which are fully-polynomial almost uniform generators in the sense of [jerrum90].
The goal of this paper is to introduce a new technique for exactly uniform generation. Using it to modify switching-based algorithms, we can obtain vastly reduced run times, specifically, we aim for linear-time algorithms. In the context of generating a random graph, this should be linear in the number of edges, i.e. , where we use to denote the sum of the degrees in the graph. In particular, we obtain a linear-time algorithm that works for the same family of degree sequences as the algorithm in [mckay90]. We first review the salient features of the latter algorithm.
The algorithm first generates an initial random multigraph in expected time that is linear in . (We describe the algorithm here in terms of multigraphs, though it is presented in [mckay90] in terms of pairings occuring in the above-mentioned configuration model.) The initial multigraph contains no loops of multiplicity at least two, no multiple edges of multiplicity at least three, and has a sublinear number of loops and double edges. The algorithm then uses an operation called -switching to sequentially “switch away” all the double edges (loops are treated similarly so we ignore them at present). Provided that a multigraph was uniform in the class of graphs with double edges, the result of applying a random -switching to is a random multigraph that is slightly non-uniformly distributed in a class of multigraphs with double edges. The following rejection scheme is used to equalise probabilities. Let be the number of ways that a -switching can be performed on and be the number of -switchings that can create . Assume that and are uniform upper and lower bounds for and respectively over all multigraphs with double edges. If a switching that converts some multigraph to a multigraph is selected by the algorithm, then the switching is accepted with probability , and rejected otherwise. If the switching is accepted, it is applied to the multigraph, whereas rejection requires re-starting the algorithm from scratch. Computing takes time, which dominates the time complexity of [mckay90].
The algorithm presented in this paper is obtained from the algorithm in [mckay90] by modifying the time-consuming rejection scheme. First, it was observed in [mckay90] that the rejection can be separated into two distinct steps, which are given the explicit names f- and b-rejection in [gao17]. The f-rejection step rejects the selected switching with probability , and the b-rejection step rejects it with probability . It is easy to see that the overall probability of accepting the switching is the same as specified originally above. By a slick observation, there is essentially no computation cost for computing the probability of f-rejection. (See the explanations in Section 4.4). The modification in the present paper is to further separate b-rejections into a sequence of sub-rejections by a scheme we will call incremental relaxation. This scheme will still maintain uniformity of the multigraphs created.
The basic idea of incremental relaxation, as used in the present paper, can be described as follows. Let be a (small) graph with each edge designated as positive or negative. We say that an -anchoring of a graph is an injection that maps every positive edge of to an edge of , and every negative edge to a non-edge of . (This is a generalisation of rooting at a subgraph, which usually corresponds to the case that has positive edges only.)
Now assume that an -anchored graph is chosen u.a.r., i.e. each such ordered pair with in some given set , and , an -anchoring of , is equally likely. We can convert this to a random graph by finding the number of -anchorings of , and accepting with probability where is a lower bound on the number of -anchorings of any element . However, computing corresponds to computing as described above and can be time-consuming. The key idea of our new method is that we incrementally relax the constraints imposed on by , so that rejection is split into a sequence of sub-rejections. Set and let denote the restriction of to . With this definition, for each , is an -anchoring of . Thus determines some subset (increasing with ) of the constraints on corresponding to the edges of , and given that is uniformly random, we can obtain a uniformly random anchoring by applying a similar rejection strategy, but using only the number of ways that can be extended to an -anchoring of . This procedure of incremental relaxation of constraints can be highly advantageous if for each , can be computed much faster than . In this way, a sequence of uniformly random objects is obtained, involving anchorings at ever-smaller subgraphs of , until the empty subgraph is reached, corresponding to obtaining u.a.r.
To see that this idea applies to the problem at hand, we observe that the existence of a -switching (defined in Section 4.2) from to forces to include a set of edges (the positive edges, forming two paths of length 2, in a copy of a certain graph ), and to exclude a set (the negative edges, forming a matching, in ). So comes accompanied by an -anchoring.(Refer to right side of Figure 2 for a drawing of .) To apply incremental relaxation we first compute the number of ways to complete such an anchoring given the first 2-path and use that to obtain a random 2-path-anchored graph, and then relax the 2-path anchoring in a similar manner. The details of applying this scheme to -switchings are given in Section 4.2.
In Section 3 we present the incremental relaxation technique in a more general setting, avoiding injections but instead employing more arbitrary sets of constraints. We apply the incremental relaxation scheme in detail in the case (e.g. in the regular degree case) in Sections 4 – 4.4. The switchings we use are exactly the same as those in [mckay90]. When the incremental relaxation scheme is combined with the new techniques introduced in [gao17, gao18], it allows us to obtain fast uniform samplers of graphs for the family of degree sequences permitted in [gao17, gao18]. In particular, we obtain a linear-time algorithm to generate graphs with power-law degrees, and a sub-quadratic-time algorithm to generate -regular graphs when . We will discuss these algorithms in Sections 5 and 6.
2 Main results
Let be specified where is even. Let and for positive integers define . Note that for all .
We say that is graphical if there exists a simple graph with degree sequence . For the rest of this paper we only consider graphical sequences . Our first result is that our algorithm INC-GEN uniformly generates a random graph with degree sequence and runs in linear time provided that is “moderately sparse”. The description of INC-GEN is given in Section 4. The proof of the uniformity will be presented in Section 4.3, and the time complexity is bounded in Section 4.4.
Theorem 1**.**
Let be a graphical sequence. Algorithm INC-GEN uniformly generates a random graph with degree sequence . If then the expected run time of INC-GEN is . The space complexity of INC-GEN is .
Our second algorithm, INC-REG, described in Section 5, is an almost-linear-time algorithm to generate random regular graphs. The run time is when . This improves the run time of the uniform sampler in [gao17].
Theorem 2**.**
Algorithm INC-REG uniformly generates a random -regular graph. If then the expected run time of INC-REG is .
Our third algorithm, INC-POWERLAW, described in Section 6, is a linear-time algorithm to generate random graphs with a power-law degree sequence. A degree sequence is said to be power-law distribution-bounded with parameter , if the minimum component in is at least 1, and there is a constant independent of such that the number of components that are at least is at most for all . Note that the family of power-law distribution-bounded degree sequences covers the family of degree sequences arising from i.i.d. copies of a power-law random variable. Uniform generation of graphs with power-law distribution-bounded degree sequences with parameter was studied in [gao18], where a uniform sampler was described with expected run time . This was the first known uniform sampler for this family of degree sequences. With our new rejection scheme, we improve the time complexity to linear.
Theorem 3**.**
Let be a power-law distribution-bounded degree sequence with parameter . Algorithm INC-POWERLAW uniformly generates a random graph with degree sequence , and the expected run time of INC-POWERLAW is .
Algorithms INC-GEN and INC-REG can easily be modified if represents a bipartite graph’s degree sequence. As an example, we present algorithm INC-BIPARTITE in Section 7 as the bipartite version of INC-GEN.
Theorem 4**.**
Algorithm INC-BIPARTITE uniformly generates a random graph with bipartite degree sequence . If then the expected run time of INC-BIPARTITE is . The space complexity of INC-BIPARTITE is .
3 Uniform generation by incremental relaxation
We provide here a general description of the relaxation procedure, so it can be applied in different setups. Let and be given, where is a finite set and is a positive integer. We are also given , for , where each is a multiset consisting of subsets of . Let denote the Cartesian product, and let be any subset of such that each satisfies . Given , define for each . For each set and set .
For any and , define ; i.e. is the prefix of .
Later in our applications of relaxation, we will let be a set of multigraphs. Each element of can be identified with a multigraph that contains a specified substructure (determined by the -s) on a specified set of vertices. In terms of the notation introduced in Section 1, elements of will correspond to -anchorings of multigraphs for some graph and some sequence . Permitting multiple copies of elements in is useful in the case where two distinct constraints may correspond to the same subset of . This happens in our applications due to the symmetry of the substructures in .
Next we define a procedure Loosen, which takes an as input, and outputs an with a certain probability and otherwise ‘rejects’ it and terminates. Our Relaxation Lemma (Lemma 5 below) shows that if is uniformly distributed in then the output of Loosen is uniformly distributed in .
For and , let be the number of such that . In other words, is the number of ways to extend to an element of . Let be a lower bound on over all , and assume that for all , . For with we define the following procedure.
Procedure Relax is defined for . It repeatedly calls Loosen until reaching a . We say that procedure Relax performs incremental relaxation on .
Lemma 5** (Relaxation Lemma).**
Assume that and . Provided that is chosen uniformly at random, the output of Loosen is uniform in assuming no rejection.
**Proof. ** Let . For any , the probability that Loosen outputs is equal to
[TABLE]
where denotes the event that the input of Loosen is . The second probability above is the conditional probability that no rejection occurs in Loosen, given . By our assumption, the first probability above is always equal to . By the definition of Loosen, the second probability above is equal to . By definition, is exactly the number of , such that , so the sum has exactly terms, each of which is equal to . Hence, the probability for Loosen to output is equal to , for every .
Recalling that , the Relaxation Lemma immediately yields the following corollary for the uniformity of Procedure Relax.
Corollary 6**.**
Assume that for all , , and assume is chosen uniformly at random. Then the output of Relax is uniform in , if there is no rejection.
The description of Relax as repeated calls of Loosen is useful for analysing the algorithm, but for practical implementations we refer to the following corollary.
Corollary 7**.**
*Procedure Relax, when applied to , outputs with probability
, and ends in rejection otherwise.*
In practice, we predefine the numbers . Once the numbers are computed, the b-rejection can be performed in one step using Corollary 7, and there is no need to perform Relax with its iterated calls to Loosen. As mentioned in Section 1, these numbers can be much faster to compute than the number of -anchorings of , which would be required using the scheme in [mckay90]. We also reiterate that, unlike the scheme in [mckay90], the rejection probability depends on the anchoring imposed by , as well as .
4 Algorithm INC-GEN
In this section we provide a description of INC-GEN. Let be given. We will use the configuration model [bollobas1980] to generate a random pairing, defined as follows. For every , represent vertex as a bin containing exactly points. Take a uniformly random perfect matching over the set of points in the bins. Call the resulting matching a pairing and call each edge in a pair. Finally identify the bins as vertices, and represent each pair in as an edge. This produces a multigraph from , denoted by . If a set of pairs in form a multiple edge or loop in then this set of pairs is called a multiple edge in as well, with the same multiplicity as it has in . A loop is a pair with both ends contained in the same bin/vertex. If there is a set containing more than one pair with all ends contained in the same vertex, then this set of pairs form a multiple loop. We always use loop to refer to a single loop with multiplicity equal to one. We call a multiple edge with multiplicity 2 or 3 a double or triple edge respectively. Let denote the set of all pairings with degree sequence . Recall that , and .
If define
[TABLE]
and define otherwise. The consideration of two cases is needed to ensure that certain parameters defined in Section 4.1.1 and Section 4.2.1 are positive, and thereby to ensure that the algorithm has finite expected runtime.
Let denote the set of pairings in where there are no multiple edges with multiplicity at least 3, and no multiple loops with multiplicity at least 2, and the number of loops and double edges are at most and respectively. The following result is essentially contained in [mckay90] so we only give a brief description of the proof.
Lemma 8**.**
Let be a graphical degree sequence with and be a uniformly random pairing in . Then there exists a constant such that for all sufficiently large .
**Proof. **We first note that if , then since is large enough and , we have . So we only need to consider the case when and are defined by (1).
If then the claim follows by [mckay90]*Lemmas 2 and . If then contains triple edges in expectation, whereas the expected number of multiple edges of higher multiplicity in the pairing is bounded by . Similarly, the expected number of loops of multiplicity at least 2 is . In the case that the expected number of triple edges is asymptotically a positive constant, the standard method of moments can be used to show that the joint distribution of the numbers of triple edges, double edges and loops are asymptotically independent Poisson variables. This implies our assertion. See also the discussion of this case in the proof of [mckay90]*Theorem 3.
The first step of our algorithm is to use the configuration model to generate a uniformly random pairing . Proceed if . Otherwise, reject and restart the algorithm. This type of rejection is called initial rejection. By Lemma 8, this initial rejection stage takes only rounds in expectation before successfully producing a multigraph with at most double edges, at most loops, and no multiple loops or edges of multiplicity higher than two. Then the algorithm calls two procedures, NoLoops and NoDoubles. Each of these is composed of a sequence of switching steps. In each switching step, a loop (in NoLoops) or a double edge (in NoDoubles) will be removed using the corresponding switching operation in the procedure.
Various types of rejections may occur in procedures NoLoops and NoDoubles. In all cases, if a rejection occurs then the algorithm restarts from the first step.
Let and be the set of multigraphs with degree sequence , loops, double edges and no other types of multiple edges. The following lemma guarantees uniformity of the multigraph obtained after initial rejection.
Lemma 9**.**
Let be a uniformly random pairing in . Let where and . Conditional on the number of loops and double edges in being and , is uniformly distributed over .
**Proof. **This follows from the simple observation that every pairing in appears with the same probability, and every multigraph in corresponds to exactly distinct pairings.
Note that if , then , and so INC-GEN never calls NoLoops or NoDoubles. By Lemma 9, output of INC-GEN is a uniformly distributed in . Also, by Lemma 8, INC-GEN restarts constant number of times in expectation before outputting a graph. Hence, in this case we proved Theorem 1. For the rest of this section we assume .
In the next subsection we define the procedure NoLoops. This procedure uses the same switchings as in [mckay90] (but applied to multigraphs rather than pairings) to reduce the number of loops to 0.
4.1 NoLoops
Definition 10** (-switching).**
For a graph , choose five distinct vertices such that
- •
there is a loop on .
- •
* and are single edges;*
- •
there are no edges between and , and , and .
An -switching replaces loop on and edges , , by edges , and .
See Figure 1 for an illustration of an -switching. Note that this switching is the same as the one used in [mckay90], except performed on graphs, not pairings.
Let be the number of -switchings that can be performed on . We will specify a parameter such that
[TABLE]
In each switching step, a uniformly random switching converting to some is selected. An f-rejection occurs with probability . We will next describe how to use incremental relaxation to do b-rejection. If is neither f-rejected nor b-rejected, then will be performed in this switching step.
We first give some notation. In a multigraph, a (simple) ordered edge is an ordered pair of vertices such that is a (simple) edge in the multigraph. Similarly, a (simple) ordered -path is an ordered set of vertices such that forms a (simple) -path in the multigraph.
Define to be the number of simple ordered -paths in such that there is no loop on . For a simple ordered 2-path in define to be the number of simple ordered edges in that are vertex disjoint from and such that and are non-edges. For let and be lower bounds on and respectively over all and all simple ordered 2-paths in . Positive constants and will be defined in Section 4.1.1. Any switching that can be used to create a fixed multigraph from multigraphs in can be identified with the ordered set of vertices whose adjacencies were changed by . Set and .
Informally, each iteration of NoLoops starts with a multigraph and chooses a random -switching that converts to some . In terms of the notation defined in Section 3, each such switching can be viewed as an -anchoring of , where is a graph on the right side of Figure 1 (with positive signs on solid edges, and negative signs on dashed edges). NoLoops then performs f-rejection, after which every pair (denoting an -anchoring of ), where and is an -switching that creates , arises with the same probability. After that NoLoops sequentially relaxes constraints enforced by -anchoring of by performing a b-rejection. The following is the formal description of NoLoops.
In Section 4.3 we show that if is distributed uniformly at random in , the output of NoLoops(G) is uniform in . We do this by showing that the quantities and defined above coincide with the quantities and in an application of Corollary 7.
4.1.1 Parameters in NoLoops
We now specify the values of the parameters mentioned above, which will be shown in the following lemma to satisfy the required inequalities. Define
[TABLE]
Recall that we assumed and so and are positive constants. The following Lemma establishes necessary bounds on , and .
Lemma 11**.**
Let with and . For any simple ordered 2-path in , we have
[TABLE]
For forward -switchings
[TABLE]
The proof of Lemma 11 is postponed to Section 4.5. This completes the description of NoLoops.
4.2 NoDoubles
After NoLoops is finished, we have a multigraph . Next we describe how to reduce the number of double edges in .
Definition 12** (d-switching).**
For a graph , choose six distinct vertices such that
- •
there is a double edge between and .
- •
, , are single edges;
- •
the following are non-edges: , , , .
A -switching replaces double edges between and edges , , by edges , , , .
See Figure 2 for an illustration.
For a graph , we use notation for the number of ways to perform a -switching on . We will specify such that
[TABLE]
In each switching step, a uniformly random switching converting to some is selected. An f-rejection occurs with probability .
The incremental relaxation scheme for b-rejection is analogous to that in NoLoops. Define to be the number of simple ordered -paths in . For a simple ordered 2-path in define to be the number of simple ordered 2-paths that are vertex disjoint from such that , and are non-edges.
For let and be positive lower bounds (to be specified in Section 4.2.1) on and over all and simple ordered 2-paths in . For a -switching let be the vertices whose adjacencies were changed by . Set and .
As in case of NoLoops , In Section 4.3 we show the desired uniformity property holds for NoDoubles .
4.2.1 Parameters for NoDoubles
Define
[TABLE]
Note that and are positive constants, as in Section 4.1.1.
Lemma 13**.**
Let . Then for any simple ordered 2-path in we have
[TABLE]
The proof of Lemma 13 is postponed to Section 4.5.
4.3 Uniformity
Theorem 14**.**
INC-GEN* generates graphs with degree sequence uniformly at random.*
**Proof. ** We start the proof by showing that b-rejection in both NoLoops and NoDoubles can be performed as Relax for appropriate choice of . We deal here with NoDoubles only, as the issues with NoLoops are identical.
Let be the set of -switchings that convert a multigraph in to some multigraph in . Recall that switching can be identified with an ordered set of vertices whose adjacencies were changed by , and , .
Let and let be distinct vertices. Using the notation to denote a multiset, and to denote the set of simple edges in , define
[TABLE]
Recall that
[TABLE]
We now show that
[TABLE]
Indeed, for a given simple ordered 2-path in , the number of simple ordered 2-paths such that , and are non-edges is equal to and is at least one according to Lemma 13. So for every pair with there exists a simple ordered 2-path , such that , which establishes the desired claim for .
Similarly we have
[TABLE]
If is a switching from to , then and so belongs to . So every pair , where switching creates , can be identified with an element , hence we can apply Relax to . In this setup, the quantities and (as in Section 3) are equal to and respectively. (Recall the definitions for and in Section 4.2.) It remains to note that we can set for where .
According to Corollary 7, Relax outputs with probability
[TABLE]
which is exactly equal to the probability that is not b-rejected in NoDoubles.
Hence b-rejection in NoDoubles is just an effective implementation of Relax. In view of Corollary 6 we have the following.
Corollary 15**.**
Let be non-negative integers and let be chosen u.a.r from the class of all pairs , where and is an -switching (or -switching, if ) that creates . If is not b-rejected by NoLoops (or NoDoubles, respectively), then is uniform in .
Now we are ready to prove the theorem. Assume that we initially generated a graph for some and .
We say that a graph was reached in NoLoops if a switching creating was selected in a switching step, and was not rejected. Let denote the multigraph reached after switching steps of NoLoops, if no rejection occurred (let if a rejection occurs during the -th step or earlier). We will prove by induction on , that conditional on , is uniformly distributed in .
The base case holds by Lemma 9. Assume and is uniformly distributed in . Then, there exists such that the probability that is equal to , for every . Now, for every and every -switching that results in , the probability that was obtained during the -st iteration of NoLoops and not f-rejected is equal to
[TABLE]
So, is uniform in class of all pairs , where and is an -switching that creates . By Corollary 15, if is not b-rejected then is uniform in . Inductively, the output of NoLoops is uniform in provided no rejection. This holds as well for NoDoubles. Therefore, INC-GEN generates every graph in with the same probability.
4.4 Time and space complexity
Lemma 16**.**
The probability of an f- or b-rejection during a single run of INC-GEN is at most .
**Proof. ** First, note that if or then both are smaller than 1 and NoLoops and NoDoubles are never called, since in these cases after initial rejection we obtain a uniformly random simple graph. Assume . We first deal with NoLoops.
By Lemma 11, the probability of no rejection in a single switching step of NoLoops is at least
[TABLE]
Since there are at most iterations of NoLoops, the probability of no rejection during NoLoops is at least
[TABLE]
Similarly, for NoDoubles, the probability that no rejection occurs in a single switching step, assuming no rejection occurring before, is at least
[TABLE]
As there are at most iterations of NoDoubles, the probability of no rejection during NoDoubles is at least
[TABLE]
Hence, the probability of any rejection during a single run of NoLoops, or NoDoubles is .
Now we complete the proof for Theorem 1, which follows from Theorem 14 and the following.
Theorem 17**.**
Provided , the expected run time of INC-GEN is . Space complexity of INC-GEN is .
**Proof. ** We start with estimating space complexity. By implementing appropriate data structures (uninitialised adjacency matrix and sorted arrays) we may assume that it takes constant time for checking adjacency of the vertices and to access the list of neighbours. We also store the positions of multiple loops and double edges. In total our space complexity is bounded by .
By Lemmas 8 and 16, INC-GEN restarts a constant number of times in expectation before outputting a graph. So we only need to estimate the run time for a single run of INC-GEN. The initial generation of takes time. The positions of all loops and multiple edges can be stored along with the generation of , so the detection of triple edges and double loops requires negligible time comparatively. Assuming the initial pairing survives initial rejection, the numbers of loops and double edges can be updated in constant time after each switching. We need to show that both NoLoops and NoDoubles can be implemented in time .
We first deal with the implementation of the f-rejection step. Instead of computing , we choose a random loop (on a vertex ) and then independently choose two uniformly random ordered edges and (this all can be done in time ). If on the corresponding we cannot perform an -switching due to some vertices colliding, forbidden edges being present, or single edges being actually loops or double edges, then we reject such (checking if a switching can be performed on can be done in constant time). There are ways to choose such a set , and the probability of accepting is exactly .
Similarly, for f-rejection in NoDoubles, we choose a random ordered double edge , and independently choose two uniformly random ordered edges (repetition allowed) to be and and reject the corresponding set if a -switching cannot be performed on it. There are exactly total choices for and probability of accepting it is exactly .
Implementation of the b-rejection step is more complicated; this requires computing and . We start with computing , which we define to be the number of simple ordered -paths in with no loop on . We can do this initially in time by going through all vertices which have no loop on them and checking how many single edges are incident to . (We are counting paths from their middle vertex: if there are such edges, contributes to the count of paths.) After each -switching and -switching, can be updated in time . Indeed, each switching affects the adjacency of at most six pairs of vertices. For each adjacency change we can count the 2-paths it affects in time . has to be updated at most times, so the initial calculations and the update of can be done in time in total.
Now we prove that can be calculated in time . Indeed, for , is the number of simple ordered edges so that and there is no edge or . Thus can be estimated as minus the number of “bad” choices of , ie choices that violate at least one of the three conditions. This number of bad choices can be calculated in time by going through the 2-neigborhood of and . On the other hand, is already given by , and thus does not require additional computation.
For b-rejections in NoDoubles, we need to compute and . Again, the value of is already given by . We claim that can be calculated in time . Assume is given and is fixed and we are choosing . The number of simple ordered paths is given by , so we need to subtract from this the number of paths where some vertices collide with , or there is an edge (or double edge) between , or , or . Formally, let with and be the set of simple ordered -paths such that coincides with , and let , , be the sets of simple ordered -paths such that , , or is an edge (or a double edge), respectively. Then
[TABLE]
To estimate the size of we can use the inclusion-exclusion formula. It is easy to see that no more than three different can have non-empty intersection, and each of the terms involving at least one of the can be computed in time . Similarly, to estimate we use the formula
[TABLE]
We only show in detail how to calculate the size of in time , as the sizes of the other three sets can be computed similarly. We run through all possible choices of and show that, for each one, it takes constant time to count the vertices such that .
To start with, for each vertex of let denote the number of 2-paths such that vertex is different from and , is a single or double edge, and is single edge. The values of can be pre-computed for all in time by going through in the neighborhood of and all in the neighborhood of . After that, to evaluate , we go through the choices of as a neighbor of (at most such choices), and as a neighbor of (again at most ). For each choice of we first check if it is a valid choice for , that is if is an edge (or double edge) and if is a single edge. (This can be done in constant time.) If given is a valid choice for , then there are exactly choices for so that , if is a non-edge, and there are exactly choices for so that , if is an edge (double or single). Since going through all possible choices of takes time, and moreover given it takes constant time to count the elements of of the form , the size of can be calculated in time .
Since NoDoubles runs for at most iterations, and each iteration can be performed in time , it takes at most time for single run of .
In conclusion, the expected run time of INC-GEN is .
Alternatively, INC-GEN can be implemented (by using sorted adjacency listings for each vertex instead of adjacency matrix) so that the expected runtime is and space complexity is .
4.5 Proofs of Lemmas 11 and 13
The following lemma is used to estimate and .
Lemma 18**.**
Let be a graph with and . Then
[TABLE]
For with we have
[TABLE]
**Proof. ** Recall, that is equal to the number of choices of simple ordered 2-path such that there is no loop on . The same is true for . We first deal with . In order to count the valid -paths we first choose the vertex and then two distinct edges and . There are at most ways to choose two adjacent edges in , and hence the upper bound. For the lower bound we have to subtract the choices for which either there is a loop on vertex (at most choices), or one of the edges and is a double edge (at most choices, noting that for every double edge we may choose an edge from it in 2 ways and order it in 2 ways). Hence the number of choices of that contribute to is at least
[TABLE]
from which the lower bound follows.
The bounds for follow by just setting . For the rest of this subsection set , where is a simple path in a multigraph with no loop on .
**Proof of Lemma 11. ** First we deal with . Lemma 18 implies that
[TABLE]
The inequalities required for are
[TABLE]
There are at most choices for an ordered edge without any restrictions, hence the upper bound of . Next, the choices of that do not contribute to consist of one of the following three cases:
- (i)
is a double edge or a loop;
- (ii)
or and not (i);
- (iii)
at least one of and is an edge and not (i), nor (ii).
There are at most edges that satisfy (i) (noting that loops count twice because the bound counts each edge once for each way to orient it); at most choices that satisfy (ii); at most choices that satisfy (iii). Hence the number of choices of that contribute to is at least
[TABLE]
from which the lower bound follows (noting that the hypotheses of the lemma imply and ).
Turning to the estimation of , we first choose a vertex with a loop (in ways), and then ordered edges and (in at most ways each). Therefore, . For the lower bound we need to subtract the following three cases: at least one of or is a loop or a double edge (at most ); some of the vertices coincide (at most such choices); or some of the edges , , and are present (at most choices). Hence, there are at least
[TABLE]
-switchings that can be applied to . Again using and , we obtain a lower bound for .
**Proof of Lemma 13. ** Again, we deal with and first. Lemma 18 implies that
[TABLE]
We need to show that
[TABLE]
Here is the number of simple ordered 2-paths that do not intersect with and , , are not edges. To choose we first choose the vertex and then different edges and . There are at most ways to choose two adjacent edges in , hence the upper bound. For the lower bound, we have to subtract choices where any of the following holds:
- (i)
at least one of and is a double edge;
- (ii)
some of the vertices coincide with some of vertices in and not (i);
- (iii)
at least one of edges , and is present and not (i), nor (ii).
There are at most choices for (i), at most choices for (ii), and at most choices for (iii). Hence the number of choices of that contribute to is at least
[TABLE]
from which the lower bound follows.
To estimate we first choose an ordered double edge , then consecutively ordered edges and so that for some switching . There are ways to choose and at most ways to choose each of the single edges. Therefore . For the lower bound we need to subtract the choices in each of the following three cases: at least one of and forms a double edge (there are at most such choices); some of the vertices coincide (at most choices); or some of the edges , , and are present (at most choices). Hence, there are at least
[TABLE]
-switchings that can be applied to . The lower bound for follows, using .
5 Regular degree sequences
In this section we aim at uniform generation of -regular graphs where . In [gao17], a uniform sampler called REG was given which runs in time in expectation. Similar to INC-GEN, REG first generates a uniformly random pairing which does not contain too many loops, double edges, or triple edges, and does not contain any multiple loops, or any multiple edges of multiplicity greater than three. Then REG goes through three “phases”, reducing the loops, triple edges, and finally all double edges. Our new algorithm INC-REG has exactly the same structure, employs the same switchings, but has a more efficient rejection scheme. The switchings in REG are defined on pairings instead of on multigraphs. These two definitions are equivalent and effect parameters such as and by a constant factor in the two definitions. We refer the reader to [gao17] for the description of REG, which we do not reproduce here due to its length and complicated structure. For consistency, we will also define switchings on pairings in this section.
Thus, we will choose points in the vertices (instead of choosing vertices) and switch pairs involving these points. Instead of giving a formal definition we will only present a figure of the switchings, as the figures are self-explanatory. The choices of points are always made so that only the designated multiple edge, or loop is removed, without deleting any other multiple edges. Certain adjacency requirements are enforced so that the switching does not cause the creation of other multiple edges, unless specified. This is the same as for -switchings and -switchings in Section 4.
The first phase reduces the number of loops. Our new algorithm INC-REG simply replaces that phase in [gao17] by procedure NoLoops(adapted to pairings).
The second phase reduces the number of triple edges. The switchings used in [gao17] are in Figure 3, and in INC-REG we use the same switchings, which we call -switchings.
Let and be the maximum numbers of double and triple edges permitted after the initial rejection. They were set in [gao17]*eq. (36). We keep this same definition for INC-REG. In particular, and .
Given a pairing that contains exactly triple edges, let be the number of possible ways to perform a -switching on .
As before, for a switching from to , where designates the set of points involved in the switching, define ordered subsets of points , and (here “+” denotes concatenation of ordered sets). As in NoLoops and NoDoubles, we define to be the number of ordered such that for some -switching that creates .
Regarding complexity, after generating the pairing we can make an initial computation to locate all multiple edges in time . Similar to the argument in Section 4.4, there is no need to compute for f-rejections. Each can also be computed initially in time and updated in time . We can do this by additionally keeping track of the quantity — the number of simple 3-stars in , which requires for initial computation and time for updating after each -switching. Then is given by and can be computed as , where is number of bad choices of 3-stars due to vertex collision or forbidden edges present. Similar to the argument in Theorem 17, can be calculated in time . Since , the total run time is bounded by in this phase.
To complete the definition of the b-rejection scheme, we specify the following upper bound for , and lower bounds for , for containing exactly triple edges. These bounds are easy to verify with straightforward inclusion-exclusion arguments.
[TABLE]
Now, after a uniformly random -switching converting a pairing to is selected, the switching is performed with probability
[TABLE]
and is rejected with the remaining probability.
Finally, the last phase reduces the number of double edges. In REG, this phase uses two types of switchings, type I and type II. They are drawn in Figures 5 and 5. In a type I switching, along with the removal of the designated double edge, it is allowed to simultaneously create a new double edge, but no more than one. If no new double edge is created, the switching is said to be in class A, otherwise, it is in class B. See Figure 6 for an example of a type I, class B switching. A type II switching always deletes a designated double edge, and simultaneously creates exactly two double edges, and a type II switching is always in class B. In each switching step, for a pairing with double edges, REG first chooses a switching type with a specified distribution over , then uniformly selects a random type switching that can be performed on and obtains a pairing . An f-rejection may occur at this point. If the selected switching is of class , REG performs a b-rejection based on the number of class switchings that can produce the resulting pairing . We refer the interested readers to [gao17]*Sections 2, 5 for the rationale of the uses of different types of switchings and the classification of switchings into multiple classes.
The last phase runs as a Markov chain, occasionally increasing or not changing, but usually decreasing, the number of double edges. (The steps that do not increase the number of double edges are only chosen with very small probabilities.) Once it reaches a pairing with no double edges, it outputs this pairing. In REG the parameters are chosen so that
- (i)
the expected number of times a switching appears in the algorithm after f-rejection depends only on the class of and the number of double edges in .
- (ii)
the expected number of times a pairing is reached in REG depends only on the number of double edges in .
The critical property of b-rejection that is used in the derivation of the parameters is property (iii) described below. For a pairing and a class , let be the set of class switchings that result in . In REG, b-rejection satisfies the following
- (iii)
for all with double edges and all
[TABLE]
for some constants that were specified in [gao17]. We note here that as long as property (iii) is satisfied for the same set of constants , we can replace b-rejection in REG with any other rejection scheme and the modified algorithm that we obtain would still satisfy (i) and (ii).
Our new algorithm INC-REG uses same set of switchings as in [gao17]. The only nontrivial modifications are related to b-rejection, namely we obtain INC-REG by replacing b-rejection in REG for class switching with a corresponding version of incremental relaxation. Let be the number of class switchings that produce . The parameters are defined in [gao17] as certain uniform lower bounds for , for pairings containing exactly double edges. Instead of computing as in [gao17], we will now compute the quantity
[TABLE]
which depends on the switching that converts some pairing to . (Sets and are defined in the end of this section for each of the two classes of switchings.) In REG, a graph is not b-rejected with probability , while in INC-REG we set the probability of performing incremental relaxation without rejection to be . It is straightforward to check that the constants are still lower bounds for . In this context, the ideas in the proof of Lemma 5 can be used to show that
[TABLE]
for all and , and so the relaxation scheme in INC-REG satisfies property (iii) with the same constants . Hence, INC-REG also satisfies properties (i) and (ii) and so every simple pairing is generated with the same probability. Once again we only need to compute to run the last phase of the algorithm.
To complete the description of sequential relaxation we need to consider two different anchorings for . For , only type I switchings can be in class A, and the type-I-class-A switchings are exactly the -switchings defined in Section 4. Thus, we will define exactly the same as in Section 4 and lower bound for is defined to be (as in [gao17])
[TABLE]
The total run time with contributions from computing b-rejection probabilities for class A switchings is then .
Next, consider . Every class-B switching can be identified with its image, that is with an ordered set of points , being a permutation of , such that is in a double edge, corresponds to vertex or , and such that a 2-path containing has either one or two double edges (in the later case, the point belongs to the same vertex as ).
To be more precise, assuming is a switching of class B, we define , to be an ordered set of six points involved in a non-simple 2-path (ordered by natural order), and . There are essentially four possible places where edge can be, all resulting in formally different sets and , for example, if is in vertex and is in , then and . Similar to Section 4.2, for let denote the number of ordered such that for some class B switching that creates . For a switching we set
[TABLE]
A uniform lower bound for which depends only on , the number of double edges in , can be defined by
[TABLE]
where
[TABLE]
Note that the value is always equal to , as there are possibilities to choose edge as one of the edges in a double edge , four possibilities to label pair of different from (this pair can be labelled as , or ) and for each such choice there are exactly choices for a second edge in a -path involving . For the value of , we use the same procedure as we used to calculate in Theorem 17, so can be calculated in time . It now follows from the proof in [gao17] that the total run time, including the contributions from computing b-rejection probabilities for class B switchings, is .
6 Power-law degree sequences
Our approach can be implemented to accelerate an existing algorithm for the uniform sampling of graphs with a degree sequence whose degree frequencies approximately follow a given power-law. The degree sequences being addressed can contain much larger degrees than permitted in the algorithms described so far in this paper. In the extended abstract of the present paper [agw20], the authors presented an algorithm INC-POWERLAW for this purpose, that uses exactly the same switchings as in [gao18] and claimed linear expected run-time. Unfortunately, there was a glitch in that algorithm since one step required super-linear run-time. The algorithm was repaired by Allendorf [allendorf2020] in consultation with the authors of the present paper, to maintain linear time, at the expense of introducing many more kinds of switching operations.
The main difficulty with such degree sequences is that the multiplicities of edges between vertices in a random pairing can be very large. The algorithm in [gao18] consists of two stages. In the first stage, multiple edges and loops of high multiplicities are switched away. By the end of the first stage, the only remaining multiple edges are single loops, double edges or triple edges. The time complexity for the first stage is already only in expectation (see Lemma 11 in [gao18]); this is quick because the expected number of edges involved in multiple edges is quite small. The second stage contains three phases during which loops, triple edges and double edges in turn are removed using switchings. The most complicated phase is for the removal of the double edges. This involves six different types of switchings.
INC-POWERLAW is identical to the algorithm in [gao18] for the first stage. In the second stage, INC-POWERLAW uses the same switchings and rejection scheme as in INC-REG for the deletion of loops and triple edges. For the third stage (elimination of double edges), INC-POWERLAW uses the same types of switchings as in [gao18], and the modified version in [allendorf2020] uses 18 kinds of switchings. This phase uses incremental relaxation for b-rejection in the same way as described for INC-REG in Section 5. We omit a detailed proof of Theorem 3 since the full story is given in [allendorf2020].
7 Bipartite graphs
With some minor modification our algorithm can be adjusted for generation of bipartite graphs with one part having degrees and the other part having degrees . Define
[TABLE]
The algorithm INC-BIPARTITE first uses the configuration model to generate a uniformly random pairing with bipartite degree sequence . The configuration model for a bipartite degree sequence is similar to the one for a general degree sequence, except that points in vertices of are restricted to be matched to points in vertices of . Let denote the set of pairings with bipartite degree sequence , and be those containing at most double edges and no other types of multiple edges. An initial rejection is applied if .
The following lemma, which is based on Lemmas 2B and 3B′ from [mckay90], guarantees that the probability of an initial rejection is bounded away from , provided .
Lemma 19**.**
Let be a uniformly random pairing in . There exists a constant such that for all sufficiently large .
To remove the double edges, Algorithm INC-BIPARTITE uses the bipartite version of the -switching operation in Section 4, in which vertices are in and vertices are in .
We define as before and we redefine
[TABLE]
Following a similar proof we have the following bipartite version of Lemma 13.
Lemma 20**.**
Let with . Then for any simple ordered 2-path in we have
[TABLE]
Now we modify NoDoubles in Section 4 by using the bipartite version of the -switching operation, and the new definition of the parameters . Algorithm INC-BIPARTITE is given as follows.
Theorem 4 follows by a proof almost identical to that of Theorem 1.
References
