Bipartitioning of directed and mixed random graphs
Adam Lipowski, Antonio Luis Ferreira, Dorota Lipowska, Manuel A., Barroso

TL;DR
This paper extends the understanding of bipartitioning in random graphs to directed and mixed types, revealing that key properties depend mainly on the total number of directed links, with implications for phase transitions.
Contribution
It demonstrates that the relation between cluster properties and optimal bipartitions in undirected graphs also applies to directed and mixed graphs, highlighting the role of total directed links.
Findings
Satisfiability threshold aligns with the giant OUT component reaching 1/2.
Partition cost and cluster properties depend mainly on total directed links.
Location of replica symmetry breaking transition is primarily influenced by total directed links.
Abstract
We show that an intricate relation of cluster properties and optimal bipartitions, which takes place in undirected random graphs, extends to directed and mixed random graphs. In particular, the satisfability threshold coincides with the relative size of the giant OUT component reaching~{1/2}. Moreover, when counting undirected links as two directed ones, the partition cost, and cluster properties, as well as location of the replica symmetry breaking transition for these random graphs depend primarily on the total number of directed links and not on their specific distribution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Bipartitioning of directed and mixed random graphs
Adam Lipowski
Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
António Luis Ferreira
Departamento de Física, I3N, Universidade de Aveiro, Portugal
Dorota Lipowska
Faculty of Modern Languages and Literature, Adam Mickiewicz University, Poznań, Poland
Manuel A. Barroso
Departamento de Física, I3N, Universidade de Aveiro, Portugal
Abstract
We show that an intricate relation of cluster properties and optimal bipartitions, which takes place in undirected random graphs, extends to directed and mixed random graphs. In particular, the satisfability threshold coincides with the relative size of the giant OUT component reaching 1/2. Moreover, when counting undirected links as two directed ones, the partition cost, and cluster properties, as well as location of the replica symmetry breaking transition for these random graphs depend primarily on the total number of directed links and not on their specific distribution.
I Introduction
Statistical mechanics methodology is frequently used in the studies of various optimization problems hartmann2006 . Presence of quenched disorder, energy barriers, or various phase transitions in such problems implies interesting analogies to some glassy or magnetic systems and the usage of methods developed in the physical sciences turns out to be remarkably successful krzakala2016 .
A graph bipartitioning is an optimization problem that appears in various contexts such as VLSI circuit design karypis , parallel computing pothen or computer vision kolmogorov . Statistical mechanics approaches are particularly fruitful in the undirected random graph version of this problem. Such a version was studied numerically using a simulated annealing banavar ; martin or an extremal optimization boetcher but important analytical results were also obtained using a replica method fu ; liao ; mezard , the technique that was primarily developed for studying disordered systems. In a more recent work, in which the structure of nearly optimal partitions was analyzed, some predictions concerning the replica symmetry breaking in this problem were made percus that were subsequently verified using the belief propagation method zdeborova .
In contrast to undirected random graphs, bipartitioning problem in other classes of graphs is less understood. Taking into account that in most of systems links are only approximately symmetric, it would be desirable to examine partitioning problem on random graphs with directed links, and that was our main motivation. Let us notice that despite an apparently simple modification, directed graphs are much more difficult to analyze. For example, the asymmetricity of adjacency matrix and of Laplacian considerably complicates their spectral analysis chung . Similar complications affect also a closely related problem of community detection in directed graphs malliaros .
II model
In the graph bipartitioning, one has to divide vertices into two classes of equal size, here marked as and , so that the partition cost is minimal. For undirected graphs, the partition cost is equal to the number of links with vertices of opposite signs (Fig. 1a). In the present paper, we examine an extension of this problem to directed or mixed Erdös-Rényi graphs erdos , where both directed and undirected links are present. To construct such graphs, we place links among randomly selected pairs of vertices. With probabilities and 1-, the link is undirected or directed, respectively, and in the latter case its orientation is chosen randomly. As limiting cases, in such a way we can generate undirected () or directed () random graphs.
Bipartitioning of undirected graphs bears some analogy to the Ising model, in which on each vertex , there is a spin variable , and the system is described with the following Hamiltonian
[TABLE]
In the above equation, summation is over pairs of vertices connected by a link, and the system is subject to the constraint that the numbers of and are equal, namely . In terms of spin variables, the normalized partition cost can be written as
[TABLE]
Finding an optimal partition becomes thus equivalent to finding the lowest energy of the Ising model subject to the constraint of zero magnetization. A number of approaches to the graph bipartitioning were developed, which exploit the above analogy with the Ising model. However, for directed or mixed graphs, the analogy becomes more intricate. One can retain spin variables , but the spin dynamics looses a detailed balance and a description in terms of the Hamiltonian like (1) is less obvious godreche ; sanchez . Neglecting the fact that partitioning of directed graphs involves a number of mathematical subtleties charikar ; feige , we use the partition cost that counts only links of which the origin is and the end is (Fig. 1b). With such a definition, the partition cost for directed graphs in terms of spin variables might be written as
[TABLE]
where summation is over directed links that originate at and end at . Actually, definition (3) can be applied also for undirected graphs and is equivalent to (2) since an undirected link could be considered as composed of two directed and opposite links, and if it joins verticies of different signs, it will be counted but only once.
The cost function as defined in Eq.(3) was used for example in some studies on the complexity of the partition algorithms feige . Let us notice, that in principle for directed graphs we can use the symmetric cost function (2). In such a way, however, directedness of the graph is lost since the problem becomes basically equivalent to the undirected case. The only difference would come from the possibly existing pairs of vertices joined via two oppositely oriented links. For random graphs with finite such pairs of vertices appear with a negligably small probability.
III cluster properties
Partition of random graphs relates to their percolative behavior. In particular, for undirected random graphs, it is known that as long as the size of the giant cluster is smaller than , one typically finds partitions with luczak ; mezard . Indeed, if the entire giant cluster is set as (for example) , then the remaining and most likely can be distributed among small isolated clusters, so that the partition cost is zero. Since the relative size of the giant cluster for random undirected graphs () obeys the equation
[TABLE]
the satisfiability regime with persists in undirected graphs up to . Let us note that distributing links among vertices, we generate a random graph of the average vertex degree and hence marks the percolation threshold in undirected graphs.
For , the giant cluster is greater than and some of its vertices must be , which implies a positive partition cost: . When is not much greater than , optimal or nearly optimal partitions might be constructed based on the cluster structure. In particular, an efficient strategy is to turn into some tree-like decorations of the giant component percus ; benjamini .
Such a behavior of undirected graphs prompts to search for a similar scenario in directed and mixed graphs. Let us notice, however, that percolative properties of such graphs are more subtle. Directedness of links implies that clusters are now defined as collections of vertices that might be reached from a given vertex (OUT components) or of those from which a given vertex can be reached (IN components). One can also distinguish strongly connected components, of which each vertex can be reached from any other vertex that belongs to the component, as well as the so-called tendrils or tubes bowtie ; timar . The satisfiability threshold for undirected graphs results from the condition that the largest cluster that can be uniformly magnetized reaches . It seems that in the presence of directed links, the largest OUT component magnetized as might play such role. Indeed, any vertex that does not belong to the OUT component might be connected to it but only via incoming (to the OUT component) link. It will not increase the partition cost even if such vertex is . The above strategy would require some modifications when the average sizes of IN and OUT components are different, but for random graphs analyzed in the present paper, this is not the case. Moreover, since the partition cost (3) is invariant with respect to simultaneous flipping of spins and link directions, we can equally well expect that the satisfiability threshold appears when the IN component that is magnetized as reaches .
For directed graphs (), the size of OUT and IN components can be calculated using generating function approachdorog . Placing directed links is then equivalent to the following method: for each directed pair of vertices , place a link from to with probability . For such a case, an explicit calculation of generating function is straightforward lipgont and one obtains that the average relative size of the largest OUT component satisfies the equation
[TABLE]
Let us notice that the above equation, except for the factor 2, is the same as Eq. (4), which implies that for directed graphs the percolation transition takes place at and at . More generally, the size of the largest OUT component for a directed graph at a given (above a percolation threshold ) is exactly the same as the size of the giant cluster for undirected graph at twice smaller .
For mixed graphs, analytical calculations seem to be less straightforward and we resort to numerical calculations. We calculated the largest OUT component for mixed graphs with , but in Fig. 2 we present also the results for undirected and directed graphs (for undirected graphs, the OUT component is the same as the giant cluster). Numerical results agree with the expectation that for undirected graphs, the percolation transition takes place at and for directed ones at . For mixed random graphs, some heuristic arguments suggest that two vertices connected with a directed link with probabilities 1/2 might be considered as connected via an undirected link or as disconnected herrmann . It means that a mixed random graph having links, with of them directed, is equivalent to the random graph with undirected links. One can thus expect a percolation transition taking place at that obeys the equation
[TABLE]
which for gives , a value which is consistent with numerical simulations (Fig. 2).
It follows from Eq. (6) that when plotted as a function of rather than , our numerical results should all exhibit a percolation transition at the same point. Moreover, according to Eqs. (4) and (5), the size of OUT components for directed and undirected graphs should be the same for any . In Fig. 3, we present the size of OUT components as a function of . As expected, the three kinds of graphs have the percolation transition at , but what is more, not only the data for directed and undirected graphs collapse on a single curve but also the data for mixed graphs collapse onto this curve as well. It suggests that the OUT component satisfies (or nearly satisfies) an equation similar to Eq. (5) also for mixed graphs, with replaced by . We will not analyze this issue further but we hope that a suitable extension of the generating function approach could explain these numerical findings.
IV simulated annealing
Having examined the percolative behavior of directed and mixed random graphs, we can develop a method of their partitioning. Similarly to the undirected graphs, we expect that as long as the relative size of the OUT component is smaller than 1/2, we may set it as and hope that the rest of vertices will be arranged with a zero partition cost. More challanging is the case of . In such a case, a uniform magnetization of the OUT component is excluded and some of its vertices must be set as , which will inevitably generate a partition cost. To make it optimal for undirected graphs, Percus et al. percus developed a combinatorial analysis of some outer tree-like decorations of a giant component.
Taking into account more complex structure of directed and mixed graphs, we gave up analytical manipulations and resort to a simulated annealing, which takes into account the structure of a cluster. More precisely, for a given graph, we find the largest OUT component, set it as , and classify its vertices according to the distance from its boundary. Then, depending on the size of the OUT component (we assume that ), we set the required number of vertices as . We choose them taking into account their distance from the boundary, namely, vertices close to the boundary are preferably set as —similarly as decorations of undirected graphs of Percus et al. percus . Thus we expect that such a procedure will allow for an insertion of a relatively large number of at low cost. The configuration determined in this way becomes the initial one, for which the simulated annealing is used to reshuffle in the OUT component so that the minimal cost is reached. The annealing algorithm selects a pair of oppositely magnetized vertices and exchanges them according to the Metropolis update that accepts the partition-cost increase with probability . During the run, the temperature is reduced as , where is the cooling rate and is the simulation time (unit of time is defined as an update of pairs of vertices). Because we expect that the optimal configuration to some extent will resemble the initial one, we do not want to destroy its structure during the run especially at the high-temperature regime. That is why the high-temperature regime should not last too long nor the initial temperature should not be too large. We found that the initial temperature , and the cooling rate usually lead to satisfactory partitions and numerical results presented in the following are obtained for such a choice of parameters.
In the regime with , a slightly different simulations were made. We find the OUT component, set it as and the rest of as well as of are distributed randomly. Then we run the simulated annealing that reshuffles spins only on vertices that do not belong to the OUT component (which is kept unchanged). As expected, in the regime partitions with (or with very small ) are easily found.
IV.1 partition cost
For undirected graphs, our numerical results (Fig. 4) reproduce the already known results and show that the partition cost becomes positive for . Analogous results are obtained for directed and mixed graphs, and the emergence of a positve partition cost coincides with , namely, it takes palce at for directed graphs and for mixed graphs. What is rather surprising for us is the overlap of the cost-function data when plotted as a function of (Fig. 5). Similarly to the behavior of the size of the OUT component, such a collapse indicates that the partition cost depends solely on the total number of directed bonds in the system and not on their specific distribution.
IV.2 replica symmetry
An interesting aspect of graph partitioning is related to the nature of the solution space and a possible breakdown of the so-called replica symmetry. A predicition was made by Percus et al. that in the partitioning problem of undirected random graphs, the replica symmetry breaking should occur inside the regime, while in most other optimization problems it takes place inside the costless phase percus . Recently, using a belief propagation method, it was shown that the replica symmetry breaking occurs for , in agreement with the Percus et al. conjecture zdeborova .
The replica symmetry is related to the similarity of different ground state configurations. In a replica symmetric phase, such configurations are to a large extent similar, while in a replica symmetry-broken phase, they are much different. Such a symmetry was intensively studied in the hope to clarify the nature of ordering in spin glasses marinari ; young as well as in various optimization contexts krzakala ; monasson . To examine this symmetry, we generate a graph and run the simulated annealing for two different replicas A and B. Assuming that at the end of the run these replicas are specified by their spin configurations and , respectively, we calculate the overlap defined as follows
[TABLE]
Actually, since a graph structure to some extent determines the initial configuration, the replicas differ only in the distribution of that have to be placed in the OUT component. For a given graph, specified by and , to calculate we average over pairs of replicas and we also average over different graphs.
On general grounds, one expects that in the replica symmetric phase, the distribution of is strongly peaked at a value close to , which corresponds to a single-valley structure of the ground state. In the replica broken-symmetry phase, much broader distribution is expected, which in some systems even at remains positive.
The distribution of in our simulations for undirected random graphs (Fig. 6) clearly shows two regimes: for and 0.72, the distribution is strongly peaked around , while for larger values and 0.8, the peak is smaller and tails at small are much heavier. The dependence on the system size (Fig.7) suggests that such a difference in the behaviour of is likely to persist also for larger .
Let us notice that our algorithm starts from configurations that are only partially random and thus even in the replica broken-symmetry phase, the replicas are not independent. Consequently, we cannot expect that will remain positive at . In our opinion, the numerical results support, albeit with a smaller precision, the previous estimation as a replica symmetry breaking transition zdeborova .
A similar behavior is observed for directed graphs (Fig. 8-9) and we estimate that the transition takes place at , and this is nearly twice the value of for undirected graphs. Less convincing are the results for mixed graphs but the regime with a fast decay at small (, 1.08) and a slow decay (, 1.2) can be also distinguished. Rescaling the undirected graph transition at zdeborova with the factor , we obtain , which clearly falls within the range (1.08, 1.15). It indicates that when expressed in terms of , replica symmetry breaking transitions for undirected, directed, and mixed random graphs take place at (nearly) the same value. Similarly to the percolation transitions, in the examined class of random graphs, replica symmetry breaking transitions seem to depend only on the total number of directed links in the graph.
The system size used in the calculation of overlap seems to be sufficiently large to detect the different regimes and sufficiently small to allow extensive averaging. For larger we expect that peaks in the replica symmetric regime get sharper that will indicate a genuine phase transition. Of course, more detailed analysis of finite size effects would be desirable but that would require much longer simulations. Let us notice that for spin glasses the overlap is often calculated using rather small systems katz .
V conclusions
In undirected random graphs when the size of the giant cluster is smaller than half of the system, the zero-cost bipartitions usually could be found. The idea is to keep the giant cluster uniformly magnetized and then the rest of vertices can usually be marked without any partition cost. When the number of links increases and the size of the giant cluster exceeds, but not much, half of the system, the partition cost is unavoidable, however, the optimal partitions still contain the footprint of the cluster structure. It means that for a given graph, they are all similar or, in other words, the system preserves the replica symmetry. Upon a further increase in the number of links, the relation of optimal partitions to the giant cluster weakens and the replica symmetry gets broken.
As our main result, we show that to a large extent the above scenario takes place also in directed and mixed random graphs with a giant component replaced by the giant OUT component. What is more, however, the partition cost, the satisfability threshold, and the replica symmetry breaking transition in undirected, directed, and mixed random graphs seem to depend only on the total number of directed links in the graph (counting undirected link as two directed ones). Our simulations show a similar behavior of cluster properties of these graphs, where the percolation threshold and the size of the giant component exhibit analogous dependence on the total number of directed links. A simple idea that in percolation problems, an undirected link might be equivalent to two directed links finds a strong support in some models on regular lattices herrmann . Our work shows that it extends also to some partitioning problems.
Acknowledgements: This work was partially funded by FEDER funds through the COMPETE 2020 Programme and National Funds throught FCT - Portuguese Foundation for Science and Technology under the project UID/CTM/50025/2013.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) A. K. Hartmann and M. Weigt, Phase Transitions in Combinatorial Optimization Problems: Basics, Algorithms and Statistical Mechanics (John Wiley & Sons, 2006).
- 2(2) F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova, R. Zecchina, E. W. Tramel, and L. F. Cugliandolo (Eds.), Statistical Physics, Optimization, Inference, and Message-Passing Algorithms: Lecture Notes of the Les Houches School of Physics-Special Issue, October 2013 (No. 2013) (Oxford University Press, 2016).
- 3(3) G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 7(1) , 6979 (1999).
- 4(4) A. Pothen, Graph partitioning algorithms with applications to scientific computing, Technical Report, Norfolk, VA, USA (1997).
- 5(5) V. Kolmogorov and R. Zabih, What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26 , 147 (2004).
- 6(6) J. R. Banavar, D. Sherrington, and N. Sourlas, Graph bipartitioning and statistical mechanics, J. Phys. A Lett. 20 , L 1 (1987).
- 7(7) G. R. Schreiber and O. C. Martin, Cut size statistics of graph bisection heuristics, SIAM J. Optim. 10 , 231 (1999).
- 8(8) S. Boettcher and A. G. Percus, Extremal optimization for graph partitioning, Phys. Rev. E 64 , 026114 (2001).
