Communication complexity of approximate maximum matching in the message-passing model
Zengfeng Huang, Bozidar Radunovic, Milan Vojnovic, Qin Zhang

TL;DR
This paper investigates the communication complexity of approximating maximum matchings in distributed graphs, establishing tight bounds and extending results to related graph problems in a multi-party message-passing model.
Contribution
It provides a tight lower bound on communication complexity for approximate maximum matching, applicable to other graph problems in the message-passing model.
Findings
Lower bound of bits for approximate maximum matching.
Matching upper bounds are constructed, showing tightness up to a log n factor.
Lower bounds extend to max-flow and graph sparsification problems.
Abstract
We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications. The input to the problem is a graph that has vertices and the set of edges partitioned over sites, and an approximation ratio parameter . The output is required to be a matching in that has to be reported by one of the sites, whose size is at least factor of the size of a maximum matching in . We show that the communication complexity of this problem is information bits. This bound is shown to be tight up to a factor, by constructing an algorithm, establishing its correctness, and an upper bound on the communication cost. The lower bound also appliesâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Advanced Graph Theory Research · Optimization and Search Problems
Communication complexity of approximate maximum matching in the message-passing model
Zengfeng Huang University of New South Wales, Sydney, Australia. Email: [email protected] ââ
Bozidar Radunovic Microsoft Research, Cambridge, United Kingdom. Email: [email protected] ââ
Milan Vojnovic Department of Statistics, London School of Economics (LSE), London, United Kingdom. Email: [email protected] ââ
Qin Zhang Computer Science Department, Indiana University, Bloomington, USA. Email: [email protected]
Abstract
We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications.
The input to the problem is a graph that has vertices and the set of edges partitioned over sites, and an approximation ratio parameter . The output is required to be a matching in that has to be reported by one of the sites, whose size is at least factor of the size of a maximum matching in .
We show that the communication complexity of this problem is information bits. This bound is shown to be tight up to a factor, by constructing an algorithm, establishing its correctness, and an upper bound on the communication cost. The lower bound also applies to other graph combinatorial problems in the message-passing communication model, including max-flow and graph sparsification.
1 Introduction
Complex and massive volume data processing requires to scale out to parallel and distributed computation platforms. Scalable distributed computation algorithms are needed that make efficient use of scarce system resources such as communication bandwidth between compute nodes in order to avoid the communication network becoming a bottleneck. A particular interest has been devoted to studying scalable computation methods for graph data, which arises in a variety of applications including online services, online social networks, biological, and economic systems.
In this paper, we consider the distributed computation problem of finding an approximate maximum matching in an input graph whose edges are partitioned over different compute nodes (we refer to as sites). Several performance measures are of interest including the communication complexity in terms of the number of bits or messages, the time complexity in terms of the number of rounds, and the storage complexity in terms of the number of bits. In this paper we focus on the communication complexity. Our main result is a tight lower bound on the communication complexity for approximate maximum matching.
We assume a multi-party message-passing communication model [11, 32], we refer to as message-passing model, which is defined as follows. The message-passing model consists of sites , , , . The input is partitioned across sites, with sites , , , holding pieces of input data , , , , respectively. The goal is to design a communication protocol for the sites to jointly compute the value of a given function at point . The sites are allowed to have point-to-point communications between each other. At the end of the computation, at least one site should return the answer. The goal is to find a protocol that minimizes the total communication cost between the sites.
For technical convenience, we introduce another special party called the coordinator. The coordinator does not have any input. We require that all sites can only talk with the coordinator, and at the end of the computation, the coordinator should output the answer. We call this model the coordinator model. See Figure 1 for an illustration. Note that we have essentially replaced the clique communication topology with a star topology, which increases the total communication cost only by a factor of and thus, it does not affect the order of the asymptotic communication complexity.
The edge partition of an input graph over sites is defined by a partition of the set of edges in disjoint sets , , , , and assigning each set of edges to site . For bipartite graphs with a set of left vertices and a set of right vertices, we define an alternative way of an edge partition, referred to as the left vertex partition, as follows: the set of left vertices are partitioned in disjoints parts, and all the edges incident to one part is assigned to a unique site. Note that left vertex partition is more restrictive, in the sense that any left vertex partition is an instance of an edge partition. Thus, lower bounds holds in this model are stronger as designing algorithms might be easier in this restrictive setting. Our lower bound is proved for left vertex partition model, while our upper bound holds for an arbitrary edge partition of any graph.
1.1 Summary of results
We study the approximate maximum matching problem in the message-passing model which we refer to as Distributed Matching Reporting (DMR) that is defined as follows: given as input is a graph with vertices and a parameter ; the set of edges is arbitrarily partitioned into subsets such that is assigned to site ; the coordinator is required to report an -approximation of the maximum matching in graph .
In this paper, we show the following main theorem.
Theorem 1.1**.**
For every and the number of sites , any -approximation randomized algorithm for DMRÂ in the message-passing model with the error probability of at most has the communication complexity of bits. Moreover, this communication complexity holds for an instance of a bipartite graph.
In this paper we are more interested in the case when , since otherwise the trivial lower bound of bits (the number of bits to describe a maximum matching) is already near-optimal.
For DMR, a seemingly weaker requirement is that, at the end of the computation, each site outputs a set of edges such that is a matching of size that is at least factor of a maximum matching. However, given such an algorithm, each site might just send to the coordinator after running the algorithm, which will increase the total communication cost by at most an additive term of . Therefore, our lower bound also holds for this setting.
A simple greedy distributed algorithm solves DMR for with the communication cost of bits. This algorithm is based on computing a maximal matching in graph . A maximal matching is a matching whose size cannot be enlarged by adding one or more edges. A maximal matching is computed using a greedy sequential procedure defined as follows. Let be the graph induced by a subset of edges . Site computes a maximal matching in , and sends it to via the coordinator. Site then computes a maximal matching in by greedily adding edges in to , and then sends to site . This procedure is continued and it is completed once site computed and sent it to the coordinator. Notice that is a maximal matching in graph , hence it is a -approximation of a maximum matching in . The communication cost of this protocol is bits because the size of each is at most edges and each edgeâs identifier can be encoded with bits. This shows that our lower bound is tight up to a factor. This protocol is essentially sequential and takes rounds in total. We show that Lubyâs classic parallel algorithm for maximal matching [29] can be easily adapted to our model with rounds of computation and bits of communication.
In Section 4, we show that our lower bound is also tight with respect to the approximation ratio parameter for any up to a factor. It was shown in [36] that many statistical estimation problems and graph combinatorial problems require bits of communication to obtain an exact solution. Our lower bound shows that for DMR even computing a constant approximation requires this amount of communication.
The lower bound established in this paper applies also more generally for a broader range of graph combinatorial problems. Since a bipartite maximum matching problem can be found by solving a max-flow problem, our lower bound also holds for approximate max-flow. Our lower bound also implies a lower bound for graph sparsification problem; see [4] for definition. This is because in our lower bound construction (see Section 3), the bipartite graph under consideration contains many cuts of size which have to be included in any sparsifier. By our construction, these edges form a good approximate maximum matching, and thus any good sparsifier recovers a good matching. In [4], it was shown that there is a sketch-based -approximate graph sparsification algorithm with the sketch size of bits, which directly translates to an approximation algorithm of communication in our model. Thus, our lower bound is tight up to a poly-logarithmic factor for the graph sparsification problem.
We briefly discuss the main ideas and techniques of our proof of the lower bound for DMR. As a hard instance, we use a bipartite graph with . Each site holds a set of vertices which is a partition of the set of left vertices . The neighbors of each vertex in is determined by a two-party set-disjointness instance (DISJ, defined formally in Section 3.2). There are in total DISJ instances, and we want to perform a direct-sum type of argument on these DISJ instances. We show that due to symmetry, the answer of DISJ can be recovered from a reported matching, and then use information complexity to establish the direct-sum theorem. For this purpose, we use a new definition of the information cost of a protocol in the message-passing model.
We believe that our techniques would prove useful to establish the communication complexity for other graph combinatorial problems in the message-passing model. The reason is that for many graph problems whose solution certificates âspanâ the whole graph (e.g., connected components, vertex cover, dominating set, etc), it is natural that a hard instance would be like for the maximum matching problem, i.e., each of the sites would hold roughly vertices and the neighbourhood of each vertex would define an independent instance of a two-party communication problem.
1.2 Related work
The problem of finding an approximate maximum matching in a graph has been studied for various computation models, including the streaming computation model [5], MapReduce computation model [21, 16], and a traditional distributed computation model known as computation model.
In [31], the maximum matching was presented as one of open problems in the streaming computation model. Many results have been established since then by various authors [1], [2], [3], [7], [15], [24], [23], [19], [20], [30], and [37]. Many of the studies were concerned with a streaming computation model that allows for space; referred to as the semi-streaming computation model. The algorithms developed for the semi-streaming computation model can be directly applied to obtain a constant-factor approximation of maximum matching in a graph in the message-passing model that has a communication cost of bits.
For approximate maximum matching problem in the MapReduce model, [26] gave a -approximation algorithm, which requires a constant number of rounds and uses bits of communication, for any input graph with edges.
The approximate maximum matching has been studied in the computation model by various authors [17, 27, 28, 33]. In this computation model, each processor corresponds to a unique vertex of the graph and edges represent bidirectional communications between processors. The time advances over synchronous rounds. In each round, every processor sends a message to each of its neighbours, and then each processor performs a local computation using as input its local state and the received messages. Notice that in this model, the input graph and the communication topology are the same, while in the message-passing model the communication topology is essentially a complete graph which is different from the input graph and, in general, sites do not correspond to vertices of the topology graph.
A variety of graph and statistical computation problems have been recently studied in the message-passing model [22], [32], [34], [36], [35]. A wide range of graph and statistical problems has been shown to be hard in the sense of requiring bits of communication, including graph connectivity [32, 36], exact counting of distinct elements [36], and -party set-disjointness [11]. Some of these problems have been shown to be hard even for random order inputs [22].
In [11], it has been shown that the communication complexity of the -party set-disjointness problem in the message-passing model is bits. This work was independent and concurrent to ours. Incidentally, it uses a similar but different input distribution to ours. Similar input distributions were also used in previous work such as [32] and [34]. This is not surprising because of the nature of the message-passing model. There may exist a reduction between the -party set-disjointness and DMRâ but showing this is non-trivial and would require a formal proof. The proof of our lower bound is different in that we use a reduction of the -party DMRÂ to a -party set-disjointness using a symmetrisation argument, while [11] uses a coordinative-wise direct-sum theorem to reduce the -party set-disjointness to a -party -bit problem.
The approximate maximum matching has been recently studied in the coordinator model under additional condition that the sites send messages to the coordinator simultaneously and once, referred to as the simultaneous-communication model. The coordinator then needs to report the output that is computed using as input the received messages. It has been shown in [7] that for the vertex partition model, our lower bound is achievable by a simultaneous protocol for any up to a poly-logarithmic factor.
The communication/round complexity of approximate maximum matching has been studied in the context of finding efficient economic allocations of items to agents, in markets that consist of unit-demand agents in a distributed information model where agentsâ valuations are unknown to a central planner, which requires communication to determine an efficient allocation. This amounts to studying the communication or round complexity of approximate maximum matching in a bipartite graph that defines preferences of agents over items. In a market with agents and items, this amounts to approximate maximum matching in the -party model with a left vertex partition. [14] and [6] studied this problem in the so called blackboard communication model, where messages sent by agents can be seen by all agents. For one-round protocols, [14] established a tight trade-off between message size and approximation ratio. As indicated by the authors in [14], their randomized lower bound is actually a special case of ours. In a follow-up work, [6] obtained the first non-trivial lower bound on the number of rounds for general randomized protocols.
1.3 Roadmap
In Section 2 we present some basic concepts of probability and information theory, communication and information complexity that are used throughout the paper. Section 3 presents the lower bound and its proof, which is the main result of this paper. Section 4 establishes the tightness of the lower bound up to a poly-logarithmic factor. Finally, in Section 5, we conclude.
2 Preliminaries
2.1 Basic facts and notation
Let denote the set , for given integer . All logarithms are assumed to have base . We use capital letters to denote random variables and the lower case letters to denote specific values of respective random variables .
We write to mean that is a random variable with distribution , and to mean that is a sample from distribution . For a distribution on a domain , and , we write to denote the conditional distribution of given .
For any given probability distribution and positive integer , we denote with the -fold product distribution of , i.e. the distribution of independent and identically distributed random variables according to distribution .
We will use the following distances between two probability distributions and on a discrete set : (a) the total variation distance defined as
[TABLE]
and, (b) the Hellinger distance defined as
[TABLE]
The total variation distance and Hellinger distance satisfy the following relation:
Lemma 2.1**.**
For any two probability distributions and , the total variation distance and the Hellinger distance between and satisfy
[TABLE]
With a slight abuse of notation for two random variables and , we write and in lieu of and , respectively.
We will use the the following two well-known inequalities.
Hoeffdingâs inequality
Let be the sum of independent and identically distributed random variables that take values in . Then, for any ,
[TABLE]
Chebyshevâs inequality
Let be a random variable with variance . Then, for any ,
[TABLE]
2.2 Information theory
For two random variables and , let denote the Shannon entropy of the random variable , and let denote the conditional entropy of given . Let denote the mutual information between and , and let denote the conditional mutual information given . The mutual information between any and is non negative, i.e. , or equivalently, .
We will use the following relations from the information theory:
Chain rule for mutual information
For any jointly distributed random variables , and ,
[TABLE]
Data processing inequality
If and are conditionally independent random variables given , then
[TABLE]
Super-additivity of mutual information
If are independent random variables, then
[TABLE]
Sub-additivity of mutual information
If are conditionally independent random variables given , then
[TABLE]
2.3 Communication complexity
In the two party communication complexity model two players, Alice and Bob, are required to jointly compute a function . Alice is given and Bob is given , and they want to jointly compute the value of by exchanging messages according to a randomized protocol .
We use to denote the random transcript (i.e., the concatenation of messages) when Alice and Bob run on the input , and to denote the output of the protocol. When the input is clear from the context, we will simply use to denote the transcript. We say that is a -error protocol if for every input , the probability that is not larger than , where the probability is over the randomness used in . We will refer to this type of error as worst-case error. An alternative and weaker type of error is the distributional error, which is defined analogously for an input distribution, and where the error probability is over both the randomness used in the protocol and the input distribution.
Let denote the length of the transcript in information bits. The communication cost of is
[TABLE]
The -error randomized communication complexity of , denoted by , is the minimal cost of any -error protocol for .
The multi-party communication complexity model is a natural generalization to parties, where each party has a part of the input, and the parties are required to jointly compute a function by exchanging messages according to a randomized protocol.
For more information about communication complexity, we refer the reader to [25].
2.4 Information complexity
The communication complexity quantifies the number of bits that need to be exchanged by two or more players in order to compute some function together, while the information complexity quantifies the amount of information of the inputs that must be revealed by the protocol. The information complexity has been extensively studied in the last decade, e.g., [12, 8, 9, 34, 10]. There are several definitions of information complexity. In this paper, we follow the definition used in [8]. In the two-party case, let be a distribution on , we define the information cost of measured under as
[TABLE]
where and is the public randomness used in . For notational convenience, we will omit the subscript of and simply use to denote the information cost of . It should be clear that is a function of for a fixed protocol . Intuitively, this measures how much information of and is revealed by the transcript . For any function , we define the information complexity of parametrized by and as
[TABLE]
2.5 Information complexity and coordinator model
We can indeed extend the above definition of information complexity to -party coordinator model. That is, let be the input of player with and be the whole transcript, then we could define . However, such a definition does not fully explore the point-to-point communication feature of the coordinator model. Indeed, the lower bound we can prove using such a definition is at most what we can prove under the blackboard model and our problem admits a simple algorithm with communication in the blackboard model. In this paper we give a new definition of information complexity for the coordinator model, which allows us to prove higher lower bounds compared with the simple generalization. Let be the transcript between player and the coordinator, thus . We define the information cost for a function with respect to input distribution and the error parameter in the coordinator model as
[TABLE]
Theorem 2.2**.**
* for any distribution .*
Proof.
For any protocol , the expected size of its transcript is (we abuse the notation by using also for the transcript) The theorem then follows because the worst-case communication cost is at least the average-case communication cost. â
Lemma 2.3**.**
If is independent of the random coins used by the protocol , then
[TABLE]
Proof.
The statement directly follows from the data processing inequality because given , is fully determined by the random coins used, and is thus independent of . â
3 Lower Bound
The lower bound is established by constructing a hard distribution for the input bipartite graph such that .
We first discuss the special case when the number of sites is equal to , and each site is assigned one unique vertex in together with all its adjacent edges. We later discuss the general case.
A natural approach to approximately compute a maximum matching in a graph is to randomly sample a few edges from each site, and hope that we can find a good matching using these edges. To rule out such strategies, we construct random edges as follows.
We create a large number of noisy edges by randomly picking a small set of nodes of size roughly and connect each node in to each node in independently at random with a constant probability. Note that there are such edges and the size of any matching that can be formed by these edges is at most , which we will show to be asymptotically , where OPT is the size of a maximum matching.
We next create a set of important edges between and such that each node in is adjacent to at most one random node in . These edges are important in the sense that although there are only of them, the size of a maximum matching they can form is large, of the order . Therefore, to compute a matching of size at least , it is necessary to find and include important edges.
We then show that finding an important edge is in some sense equivalent to solving a set-disjointness (DISJ) instance, and thus, we have to solve DISJÂ instances. The concrete implementation of this intuition is via an embedding argument.
In the general case, we create independent copies of the above random bipartite graph, each with vertices, and assign vertices to each site (one from each copy). We then prove a direct-sum theorem using information complexity.
In the following, we introduce the two-party AND problem and the two-party DISJ problem. These two problems have been widely studied and tight bounds are known (e.g. [8]). For our purpose, we need to prove stronger lower bounds for them. We then give a reduction from DISJ to DMR and prove an information cost lower bound for DMR in Section 3.3.
3.1 The two-party ANDÂ problem
In the two-party ANDÂ communication problem, Alice and Bob hold bits and respectively, and they want to compute the value of the function AND.
Next we define input distributions for this problem. Let be random variables corresponding to the inputs of Alice and Bob respectively. Let be a parameter. Let denote the probability distribution of a Bernoulli random variable that takes value [math] with probability or value with probability . We define two input probability distributions and for as follows.
- :
Sample , and then set the value of as follows: if , let and ; otherwise, if , let , and . Thus, we have
[TABLE] 2. :
Sample , and then choose as above (i.e. sample according to ). Then, reset the value of to be [math] or with equal probability (i.e. set ).
Here is an axillary random variable to break the dependence of and , as we can see and are not independent, but conditionally independent given . Let be the probability that under distribution , which is .
For the special case , by [8], it is shown that, for any private coin protocol with worst-case error probability , the information cost
[TABLE]
where the information cost is measured with respect to and is the random variable corresponding to . Note that the above mutual information is different from the definition of information cost; it is referred to as conditional information cost in [8]. It is smaller than the standard information cost by data processing inequality ( and are conditionally independent given ). For a fixed protocol , the joint probability distribution is determined by the distribution of and so is . Therefore, when we say the (conditional) information cost is measured w.r.t. , it means that the mutual information, , is measured under the joint distribution determined by .
The above lower bound might seem counterintuitive, as the answer to ANDÂ is always [math] under the input distribution and a protocol can just output [math] which does not reveal any information. However, such a protocol will have worst-case error probability , i.e., it is always wrong when the input is , contradicting the assumption. When distributional error is considered, the (distributional) error and information cost can be measured w.r.t. different input distributions. In our case, the error will be measured under and the information cost will be measured under , and we will prove that any protocol having small error under must incur high information cost under .
We next derive an extension that generalizes the result of [8] to any and distributional errors. We will also use the definition of one-sided error.
Definition 3.1**.**
For a two-party binary function , we say that a protocol has a one-sided error for under a distribution if it is always correct when the correct answer is [math], and is correct with probability at least conditional on .
Recall that is the probability that when , which is . Recall that , and thus . Note that a distributional error of under is trivial, as a protocol that always outputs [math] achieves this (but it has one-sided error ). Therefore, for two-sided error, we will consider protocols with error probability slightly better than the trivial protocol, i.e., with error probability for some .
Theorem 3.2**.**
Suppose that is a public coin protocol for ANDâ which has distributional error , for , under input distribution ; let denote its public randomness. Then
[TABLE]
where the information is measured with respect to .
If has a one-sided error , then
[TABLE]
If we set , the first part of Theorem 3.2 recovers the result of [8].
of Theorem 3.2.
We will use to denote the transcript when the input is . By definition,
[TABLE]
With a slight abuse of notation, in (3), and are random variables with distributions and , respectively.
For any random variable with distribution , the following two inequalities were established in [8]:
[TABLE]
and
[TABLE]
where is the Hellinger distance between two random variables and .
We can apply these bounds to lower bound the term . However, we cannot apply them to lower bound the term when because then the distribution of is not . To lower bound the term , we will use the following well-known property, whose proof can be found in the book [13] (Theorem 2.7.4).
Lemma 3.3**.**
Let . The mutual information is a concave function of for fixed .
Hence, the mutual information is a concave function of the distribution of , since the distribution of is fixed given .
Recall that is the probability distribution that takes value [math] with probability and takes value with probability . Note that can be expressed as a convex combination of and (always taking value ) as follows: . (Recall that is assumed to be smaller than .) Let and . Then, using Lemma 3.3, we have
[TABLE]
where the last inequality holds by (5) and non-negativity of mutual information.
Thus, we have
[TABLE]
where the last inequality holds because .
We next show that if is a protocol with error probability smaller than or equal to under distribution , then
[TABLE]
which together with other above relations implies the first part of the theorem.
By the triangle inequality,
[TABLE]
where the last equality is from the cut-and-paste lemma in [8] (Lemma 6.3).
Thus, we have
[TABLE]
where the last inequality is by the triangle inequality.
Similarly, it holds that
[TABLE]
From (7), (8) and (9), for any positive real numbers , , and such that , we have
[TABLE]
Let denote the error probability of and denote the error probability of conditioned on that the input is . Recall . We have
[TABLE]
where
[TABLE]
and clearly . Let be the output of when the input is , which is also a random variable. Note that
[TABLE]
where denote the total variation distance between probability distributions of random variables and . Using Lemma 2.1, we have
[TABLE]
By the same arguments, we also have
[TABLE]
and
[TABLE]
Combining (13), (14) and (15) with (11) and the assumption that , we obtain
[TABLE]
By (10), we have
[TABLE]
From the Cauchy-Schwartz inequality, it follows
[TABLE]
Hence, we have
[TABLE]
which combined with (6) establishes the first part of the theorem.
We now go on to prove the second part of the theorem. Assume has a one-sided error , i.e., it outputs with probability at least if the input is , and always output correctly otherwise. To boost the success probability, we can run parallel instances of the protocol and answer if and only if there exists one instance which outputs . Let be this new protocol, and it is easy to see that it has a one-sided error of . By setting , it is at most , and thus the (two-sided) distributional error of under is smaller than . By the first part of the theorem, we know . We also have
[TABLE]
where the inequality follows from the sub-additivity and the fact that are conditionally independent of each other given and . Thus, we have . â
3.2 The two-party DISJÂ communication problem
The two-party DISJÂ communication problem with two players, Alice and Bob, who hold strings of bits and , respectively, and they want to compute
[TABLE]
By interpreting and as indicator vectors that specify subsets of , DISJ if and only if the two sets represented by and are disjoint. Note that this accommodates the ANDÂ problem as a special case when .
Let be Aliceâs input and be Bobâs input. We define two input distributions and for as follows.
- :
For each , independently sample , and let be the corresponding auxiliary random variable (see the definition of ). Define . 2. :
Let , then pick uniformly at random from , and reset to be [math] or with equal probability. Note that , and the probability that DISJ is equal to .
We define the one-sided error for DISJÂ similarly: A protocol has a one-sided error for DISJÂ if it is always correct when DISJ, and is correct with probability at least when DISJ.
Theorem 3.4**.**
Let be any public coin protocol for DISJÂ with error probability on input distribution , where , and let denote the public randomness used by the protocol. Then
[TABLE]
where the information is measured w.r.t. .
If has a one-sided error , then
[TABLE]
Proof.
We first consider the two-sided error case. Let be a protocol for DISJÂ with distributional error under . Consider the following reduction from ANDÂ to DISJ. Alice has input , and Bob has input . They want to decide the value of . They first publicly sample , and embed in the -th position, i.e. set and . Then they publicly sample according to for all . Let . Conditional on , they sample such that for each . Note that this step can be done using only private randomness, since, in the definition of , and are independent given . Then they run the protocol on the input and output whatever outputs. Let denote this protocol for AND. Let be the corresponding random variables of respectively. It is easy to see that if , then , and thus the distributional error of is under . The public coins used in include , and the public coins of .
We first analyze the information cost of under . We have
[TABLE]
where (16) is by the supper-additivity of mutual information, (3.2) holds because when the conditional distribution of given is the same as the distribution of , and (18) follows from Theorem 3.2 using the fact that has error under .
We have established that when , it holds
[TABLE]
We now consider the information cost when . Recall that to sample from , we first sample , and then pick uniformly at random from and reset to [math] or with equal probability. Let be the indicator random variable of the event that the last step does not change the value of .
We note that for any jointly distributed random variables , , and ,
[TABLE]
To see this note that by the chain rule for mutual information, we have
[TABLE]
and
[TABLE]
Combining the above two equalities, (20) follows by the facts and .
Let and . We have
[TABLE]
where the first inequality is from (20) and the last equality is by (19).
The proof for the one-sided error case is the same, except that we use the one-sided error lower bound in Theorem 3.2 to bound (18). â
3.3 Proof of Theorem 1.1
Here we provide a proof of Theorem 1.1. The proof is based on a reduction of DISJ to DMR. We first define the hard input distribution that we use for DMR.
The input graph is assumed to be a random bipartite graph that consists of disjoint, independent and identically distributed random bipartite graphs , , , . Each bipartite graph has the set of left vertices and the set of right vertices, both of cardinality . The sets of edges , , , are defined by a random variable that takes values in such that whether or not is an edge in is indicated by .
The distribution of is defined as follows. Let , , , be independent and identically distributed random variables with distribution .111 is the marginal distribution of of the joint distribution .. Then, for each , conditioned on , let , , âŠ, be independent and identically distributed random variables with distribution , where is the conditional distribution of given . Note that for every and , .
We will use the following notation:
[TABLE]
and
[TABLE]
where each , and is the th bit. In addition, we will also use the following notation:
[TABLE]
and
[TABLE]
Note that is the input to DMR, and is not part of the input for DMR, but it is used to construct .
The edge partition of input graph over sites , , , is defined by assigning all edges incident to vertices , ,, to site , or equivalently gets . See Figure 2 for an illustration.
Input Reduction
Let be Aliceâs input and be Bobâs input for DISJ. We will first construct an input of DMRÂ from , which has the above hard distribution. In this reduction, in each bipartite graph , we carefully embed instances of DISJ. The output of a DISJÂ instance determines whether or not a specific edge in the graph exists. This amounts to a total of DISJÂ instances embedded in graph . The original input of Alice and Bob is embedded at a random position, and the other instances are sampled by Alice and Bob using public and private random coins. We then argue that if the original DISJÂ instance is solved, then with a sufficiently large probability, at least of the embedded DISJÂ instances are solved. Intuitively, if a protocol solves an DISJÂ instance at a random position with high probability, then it should solve many instances at other positions as well, since the input distribution is completely symmetric. We will see that the original DISJÂ instance can be solved by using any protocol solving DMR, the correctness of which also relies on the symmetric property.
Alice and Bob construct an input for DMRÂ as follows:
Alice and Bob use public coins to sample an index from a uniform distribution on . Alice constructs the input for site , and Bob constructs input for other sites (see Figureâ 3). 2. 2.
Alice and Bob use public coins to sample an index from a uniform distribution on . 3. 3.
is sampled as follows: Alice sets , and Bob sets . Bob privately samples
[TABLE] 4. 4.
For each , is sampled as follows:
- (a)
Alice and Bob use public coins to sample . 2. (b)
Alice and Bob privately sample and from and , respectively. Bob privately and independently samples
[TABLE] 3. (c)
Alice privately draws an independent sample from a uniform distribution on , and resets to [math] or with equal probability. As a result, . For each , Bob privately draws a sample from a uniform distribution on and resets to a sample from .
Note that the input of site is determined by the public coins, Aliceâs input and her private coins. The inputs are determined by the public coins, Bobâs input and his private coins.
Let denote the distribution of when is chosen according to the distribution .
Let be the approximation ratio parameter. We set in the definition of .
Given a protocol for DMRÂ that achieves an -approximation with the error probability at most under , we construct a protocol for DISJÂ that has a one-sided error probability of at most as follows.
Protocol
Given input , Alice and Bob construct an input for DMRÂ as described by the input reduction above. Let be the samples used for the construction of . Let be the two indices sampled by Alice and Bob in the reduction procedure. 2. 2.
With Alice simulating site and Bob simulating other sites and the coordinator, they run on the input defined by . Any communication between site and the coordinator will be exchanged between Alice and Bob. For any communication among other sites and the coordinator, Bob just simulates it without any actual communication. At the end, the coordinator, that is Bob, obtains a matching . 3. 3.
Bob outputs if, and only if, for some , is an edge in such that , and [math], otherwise.
Correctness
Suppose that DISJ, i.e., or for all . Then, for each , we must either have or , but means that is not an edge in . Thus, will always answer correctly when DISJ, i.e., it has a one-sided error.
Now suppose that for some . Note that there is at most one such according to our construction, which we denote by . The output of is correct if is an edge in . We next bound the probability of this event.
For each , for , we let
[TABLE]
[TABLE]
and
[TABLE]
Intuitively, the edges between vertices and can be regarded as noisy edges because the total number of such edges is large, but the maximum matching they can form is small (Lemma 3.5 below). On the other hand, the edges between vertices and can be regarded as important edges because a maximum matching they can form is large though the total number of such edges is small. Note that there is no edge between vertices and . See Figure 4 for an illustration.
To find a good matching we must choose many edges from the set of important edges. A key property is that all important edges are statistically identical, that is, each important edge is equally likely to be the edge . Thus, will be included in the matching returned by with a large enough probability. Using this, we can answer whether and intersect or not, thus, solving the original DISJÂ problem.
Recall that we set and . Thus, . In the following, we assume for some constant, since otherwise the lower bound will be dominated by the trivial lower bound of .222Since none of the sites can see messages sent by other sites to the coordinator (unless this is communicated by the coordinator), each site needs to communicate with the coordinator at least once to determine the status of the protocol.
Lemma 3.5**.**
With probability at least ,
[TABLE]
Proof.
Note that each vertex in is included in independently with probability . Hence, , and by the Hoeffdingâs inequality, we have
[TABLE]
â
Notice that Lemma 3.5 implies that with probability at least , the size of a maximum matching formed by edges between vertices and is smaller than or equal to .
Lemma 3.6**.**
With probability at least , the size of a maximum matching in is at least .
Proof.
Consider the size of a matching in for an arbitrary . For each , let be the index such that if such an exists (note that by our construction at most one such index exists), and let be defined as NULL, otherwise.
We use a greedy algorithm to construct a matching between vertices and . For , we connect and if is not NULL and is not connected to some for . The size of such constructed matching is equal to the number of distinct elements in , which we denote by . We next establish the following claim:
[TABLE]
By our construction, we have
[TABLE]
By the Hoeffdingâs inequality, with probability ,
[TABLE]
and
[TABLE]
It follows that with probability , it holds that is at least of value , where is as defined as follows.
Consider a balls-into-bins process with balls and bins. Throw each ball to a bin sampled uniformly at random from the set of all bins. Let be the number of non-empty bins at the end of this process. Then, it is straightforward to observe that the expected number of non-empty bins is
[TABLE]
By Lemma 1 in [18], for , the variance of the number of non-empty bins satisfies333The constants used here are slightly different from [18].
[TABLE]
Let be the number of non-empty bins in the balls-into-bins process with balls and bins. Then, we have
[TABLE]
and
[TABLE]
By the Chebyshevâs inequality,
[TABLE]
Hence, with probability , , which proves the claim in (21).
It follows that for each , we can find a matching in of size at least with probability . If , then by the union bound, it holds that with probability at least , the size of a maximum matching in is at least . Otherwise, let be the sizes of matchings that are independently computed using the greedy matching algorithm described above for respective input graphs . Let if , and , otherwise. Since for all and , by the Hoeffdingâs inequality, we have
[TABLE]
Hence, the size of a maximum matching in is at least with probability at least . â
If is an -approximation algorithm with error probability at most , then by Lemma 3.5, with probability at least , will output a matching that contains at least important edges, and we denote this event by . We know that there are at most important edges and edge is one of them. We say that is important for , if is an important edge in . Given an input , the algorithm cannot distinguish between any two important edges. We can apply the principle of deferred decisions to decide the value of after the matching has already been computed, which means, conditioned on , the probability that is at least , where . Since happens with probability at least , we have
[TABLE]
To sum up, we have shown that protocol solves DISJÂ correctly with one-sided error of at most .
Information cost
We analyze the information cost of DMR. Let be the best protocol for DMRÂ with respect to input distribution and the one-sided error probability .
Let , and . Let denote the random variable used to sample from . Recall that in our input reduction are public coins used by Alice and Bob.
We have the following:
[TABLE]
where (22) is by Lemma 2.3, (23) is by data processing inequality, (24) is by the super-additivity property, (3.3) holds because the distribution of is the same as that of , and the conditional distribution of given is the same as the conditional distribution of given , , , , in (26), is the best protocol for DISJ with one-sided error probability at most and is the public randomness used in , and (27) holds by Theorem 3.4 where recall that we have set .
We have thus shown that . Since by Theorem 2.2, , it follows that
[TABLE]
which proves Theorem 1.1.
4 Upper Bound
In this section we present an -approximation algorithm for distributed matching problem with an upper bound on the communication complexity that matches the lower bound for any up to poly-logarithmic factors.
We have described a simple algorithm that guarantees an -approximation for DMR at the communication cost of bits in Section 1. This algorithm is a greedy algorithm that computes a maximal matching. The communication cost of the algorithm is bits. If , we simply apply the greedy -approximation algorithm that has the communication cost of bits. Therefore, we assume that in the rest of this section. We next present an -approximation algorithm that uses the greedy maximal matching algorithm as a subroutine.
Algorithm: The algorithm consists of two steps:
The coordinator sends a message to each site asking to compute a local maximum matching, and each site then follows up with reporting back to the coordinator the size of its local maximum matching. The coordinator sends a message to a site that holds a local maximum matching of maximum size, and this site then responds with sending back to the coordinator at most edges from its local maximum matching. Then, the algorithm proceeds to the second step. 2. 2.
The coordinator selects each site independently with probability , where is set to (recall we assume ), and computes a maximal matching by applying the greedy maximal matching algorithm to the selected sites.
It is readily observed that the expected communication cost of Step 1 is at most bits, and that the communication cost of Step 2 is at most bits. We next show correctness of the algorithm.
Correctness of the algorithm.
Let be a random variable that indicates whether or not site is selected in Step 2. Note that and . Let be a maximum matching in and let denote its size. Let be the number of edges in which belong to site . Hence, we have because the edges of are assumed to be partitioned disjointly over the sites. We can assume that for all ; otherwise, the coordinator has already gotten an -approximation from Step 1.
Let be the size of the maximal matching that is output of Step 2. Recall that any maximal matching is at least of any maximum matching. Thus, we have , where . Note that we have and . Under the constraint for all , we have
[TABLE]
Hence, combining with the assumption , it follows that . By Chebyshevâs inequality, we have
[TABLE]
Since , it follows that with probability at least . Combining with , we have that with probability at least .
We have shown the following theorem.
Theorem 4.1**.**
For every , there exists a randomized algorithm that computes an -approximation of a maximum matching with probability at least at the communication cost of bits.
Note that is a trivial lower bound, simply because the size of the output could be as large as . Obviously, is a lower bound, because the coordinator has to send at least one message to each site. Thus, together with the lower bound in Theorem 1.1, the upper bound above is tight up to a factor.
One can see that the above algorithm needs rounds, as we use a naive algorithm to compute a maximal matching among sites. If is large, say, for some constant , this may not be acceptable. Fortunately, Lubyâs parallel algorithm [29] can be easily adapted to our model, using only rounds at the cost of increasing the communication by at most a factor. The details are provided in Appendix A.
5 Conclusion
We have established a tight lower bound on the communication complexity for approximate maximum matching problem in the message-passing model.
An interesting open problem is the complexity of the counting version of the problem, i.e., the communication complexity if we only want to compute an approximation of the size of a maximum matching in a graph. Note that our proof of the lower bound relies on the fact that the algorithm has to return a certificate of the matching. Hence, in order to prove a lower bound for the counting version of the problem, one may need to use new ideas and it is also possible that a better upper bound exists. In a recent work [20], the counting version of the matching problem was studied in the random-order streaming model. They proposed an algorithm that uses one pass and poly-logarithmic space, which computes a poly-logarithmic approximation of the size of a maximum matching in the input graph.
A general interesting direction for future research is to investigate the communication complexity for other combinatorial problems on graphs, for example, connected components, minimum spanning tree, vertex cover and dominating set. The techniques used for approximate maximum matching in the present paper could be of use in addressing these other problems.
Appendix A Lubyâs algorithm in the coordinator model
Lubyâs algorithm [29]: Let be the input graph, and be a matching initialized to . Lubyâs algorithm for maximal matching is as follows.
If is empty, return . 2. 2.
Randomly assign unique priority to each . 3. 3.
Let be the set of edges in with higher priority than all of its neighboring edges. Delete and all the neighboring edges of from , and add to . Go to step 1.
It is easy to verify that the output is a maximal matching. The number of iterations before becomes empty is at most in expectation [29]. Next we briefly describe how to implement this algorithm in the coordinator model. Let be the edges held by site .
For each , if is empty, halts. Otherwise randomly assigns unique priority to each . 2. 2.
Let be the set of edges in with higher priority than all of its neighboring edges in . Then sends together with their priorities to the coordinator. 3. 3.
Coordinator gets . Let be the set of edges in with higher priority than all of its neighboring edges in . Coordinator adds to and then sends to all sites. 4. 4.
Each site deletes all neighboring edges of from , and goes to step 1. 5. 5.
After all the sites halt, the coordinator outputs .
It is easy to see that the above algorithm simulates the algorithm of Luby. Therefore, the correctness follows from the correctness of Lubyâs algorithm, and the number of rounds is the same, which is . The communication cost in each round is at most bits because, in each round, each site sends a matching to the coordinator, and the coordinator sends back another matching. Hence, the total communication cost is bits.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Ahn, K.J., Guha, S.: Laminar families and metric embeddings: Non-bipartite maximum matching problem in the semi-streaming model. Co RR abs/1104.4058 (2011). URL http://arxiv.org/abs/1104.4058
- 2[2] Ahn, K.J., Guha, S.: Linear programming in the semi-streaming model with application to the maximum matching problem. Inf. Comput. 222 , 59â79 (2013)
- 3[3] Ahn, K.J., Guha, S., Mc Gregor, A.: Analyzing graph structure via linear measurements. In: Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA â12, pp. 459â467 (2012)
- 4[4] Ahn, K.J., Guha, S., Mc Gregor, A.: Graph sketches: Sparsification, spanners, and subgraphs. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS â12, pp. 5â14 (2012)
- 5[5] Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58 (1), 137 â 147 (1999)
- 6[6] Alon, N., Nisan, N., Raz, R., Weinstein, O.: Welfare maximization with limited interaction. In: Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pp. 1499â1512 (2015)
- 7[7] Assadi, S., Khanna, S., Li, Y., Yaroslavtsev, G.: Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model, chap. 93, pp. 1345â1364 (2016)
- 8[8] Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D.: Special issue on focs 2002 an information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences 68 (4), 702 â 732 (2004)
