The average size of matchings in graphs
Eric O. D. Andriantiana, and Valisoa Razanajatovo Misanantenaina, and, Stephan Wagner

TL;DR
This paper investigates the average size of matchings in graphs, characterizes extremal cases, and establishes inequalities relating matchings to other graph invariants, advancing understanding of graph matching properties.
Contribution
It provides a characterization of extremal graphs for average matching size and introduces inequalities linking matchings with matching energy and counts.
Findings
Characterization of extremal graphs for average matching size
Inequalities between average matching size and matching number
Relations between matching energy and matching polynomial zeros
Abstract
In this paper, we consider the average size of independent edge sets, also called matchings, in a graph. We characterize the extremal graphs for the average size of matchings in general graphs and trees. In addition, we obtain inequalities between the average size of matchings and the number of matchings as well as the matching energy, which is defined as the sum of the absolute values of the zeros of the matching polynomial.
| 1 | 2 | 3 | |
|---|---|---|---|
| 4 | 5 | 6 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The average size of matchings in graphs
Eric O. D. Andriantiana
Eric O. D. Andriantiana
Department of Mathematics (Pure and Applied)
Rhodes University, PO Box 94
6140 Grahamstown
South Africa
,
Valisoa Razanajatovo Misanantenaina
Valisoa Razanajatovo Misanantenaina
Department of Mathematical Sciences
Stellenbosch University
Private Bag X1
Matieland 7602
South Africa
and
Stephan Wagner
Stephan Wagner
Department of Mathematical Sciences
Stellenbosch University
Private Bag X1
Matieland 7602
South Africa
Abstract.
In this paper, we consider the average size of independent edge sets, also called matchings, in a graph. We characterize the extremal graphs for the average size of matchings in general graphs and trees. In addition, we obtain inequalities between the average size of matchings and the number of matchings as well as the matching energy, which is defined as the sum of the absolute values of the zeros of the matching polynomial.
Key words and phrases:
Independent sets, average size, trees, extremal problems
2010 Mathematics Subject Classification:
Primary 05C35; secondary 05C05, 05C07
This work was supported by the National Research Foundation of South Africa (grants 96236 and 96310).
1. Introduction
An independent vertex set in a graph is a set of vertices such that no two vertices are adjacent. An independent edge set, also called a matching, is a set of edges such that no two edges are adjacent. It is not surprising that these two concepts are closely related, an elementary example being the fact that a matching in a graph is an independent set in the corresponding line graph. Two popular graph invariants associated to these parameters are the Merrifield-Simmons index and the Hosoya index, which are the total number of independent sets and the total number of matchings respectively. Extremal problems, where one is looking for the maximum or minimum of an invariant in a specified class of graphs, have been studied quite thoroughly for both the Merrifield-Simmons index and the Hosoya index. It is straightforward that among all -vertex graphs, the complete graph has the maximum Hosoya index and the minimum Merrifield-Simmons index, while on the other hand the empty graph has the minimum Hosoya index and the maximum Merrifield-Simmons index. Among -vertex trees, the path and the star are extremal, and there are numerous other examples of graph classes where the graphs that minimize the Merrifield-Simmons index also maximize the Hosoya index, and vice versa [13].
In a recent paper [1], we were interested in extremal questions for the average size of independent sets of graphs rather than their number. This was partly inspired by the work of Jamison [6, 7] and later authors [12, 5, 14, 10] on the average size of subtrees of trees. In the present paper, which complements our paper [1], we are concerned with the study of the average size of matchings in a graph. In view of the aforementioned relation between independent sets and matchings, we expect to get similar results as for the average size of independent sets. Indeed, we find that the graphs that minimize the average size of independent sets are also those that maximize the average size of matchings and vice versa in all instances that we treat. Specifically, it holds true for arbitrary graphs and trees of a prescribed size.
Finally, we also prove inequalities between the average size of matchings and the number of matchings as well as the matching energy of a graph, an invariant introduced in [4].
2. Preliminaries
Let be a graph. A subset of is called a matching of if the edges of do not share any common vertices. Let be the number of matchings of cardinality (also called -matchings) in . We use the following notation for the total number of matchings in , the sum of the sizes of all matchings in and the average size of matchings in :
[TABLE]
The greatest cardinality of a matching in is called the matching number of and denoted by .
As examples, let us consider the -vertex edgeless graph and the star . We have
[TABLE]
and hence
[TABLE]
The following standard and well-known proposition gives us a recursion for the total number and size of matchings.
Proposition 2.1**.**
If is an edge of , then
[TABLE]
and
[TABLE]
Similarly, if is a vertex of , then
[TABLE]
and
[TABLE]
Proof.
A matching in either contains the edge or not. The number of matchings containing is , and the number of those not containing is . Hence, the first equation holds. The argument for the second equation is similar, with the last term taking the edge itself into account.
Using similar reasoning, the last two equations are obtained by distinguishing between matchings that do not contain an edge with as an endpoint and those that do contain such an edge. ∎
Remark 2.2*.*
In particular, if is a leaf of a tree and its unique neighbor, we obtain the relations
[TABLE]
and
[TABLE]
Moreover, we have the following basic result on disjoint unions:
Proposition 2.3**.**
Let be the connected components of a graph . Then we have
[TABLE]
and
[TABLE]
thus
[TABLE]
Proof.
This follows easily from the fact that every matching of decomposes uniquely into matchings of its connected components. ∎
3. General graphs
Unlike the total number of matchings , the average size of matchings is not always a monotone function under addition of edges to the graph. For example, consider the tree in Figure 1. We have
[TABLE]
However, we can make use of the following result obtained in [1]:
Theorem 3.1**.**
Let be a nonempty finite set, and its powerset. For a set , we define
[TABLE]
Let , such that the cardinalities of the elements of are not all the same and for every there exists with . Then there exists such that
[TABLE]
Applying Theorem 3.1, with being the set of matchings of , we immediately obtain the following theorem.
Theorem 3.2**.**
If is a nonempty graph, then there exists an edge in such that
[TABLE]
As an immediate consequence, we have the following corollary (which of course is also rather trivial without Theorem 3.1).
Corollary 3.3**.**
For every -vertex graph that is not the edgeless graph , .
One might wonder whether there is an analogous statement for adding edges. If it was possible to add an edge to every non-complete graph in such a way that the average matching size increases, it would follow immediately that complete graphs maximize the invariant . While the latter is true (as will be shown in the following), the analogue of Theorem 3.1 fails, as the example of a four-vertex cycle shows: when an edge is added to the cycle , we have
[TABLE]
Thus we need another approach to show that the complete graph is still extremal. For this purpose, we first introduce some notation.
In analogy to , and , we define the following partial quantities for every nonnegative integer :
[TABLE]
We have the following lemmas.
Lemma 3.4**.**
For every nonnegative integer and every graph , we have
[TABLE]
If , then
[TABLE]
Proof.
This is straightforward from the definition of . ∎
Lemma 3.5**.**
For every -vertex graph and every nonnegative integer such that , we have
[TABLE]
Proof.
Let be any -matching of the complete graph . When the vertices that are covered by are removed, a complete graph on vertices remains. Thus there are possible ways to extend to a -matching. Conversely, every -matching can be obtained as an extension of different -matchings. It follows that
[TABLE]
Likewise, if is a -matching of and the set of vertices covered by in , then there are ways to extend to a -matching of . So by the same double-counting argument, we have
[TABLE]
Clearly, for all -matchings (with equality if and only if is complete), thus
[TABLE]
and the desired inequality follows. ∎
Remark 3.6*.*
Equality in Lemma 3.5 may hold for some (but not all) even if is not complete: for example, for the -cycle , we have
[TABLE]
Lemma 3.5 can easily be extended to the following lemma by induction:
Lemma 3.7**.**
For every -vertex graph and for every pair of integers with , we have
[TABLE]
and thus
[TABLE]
Theorem 3.8**.**
For every -vertex graph and every integer with , we have
[TABLE]
with equality if and only if is a complete graph.
Proof.
We only need to consider the case that is not complete. Note first that
[TABLE]
The inequality holds because is an increasing function of on the interval .
Assume that for some positive integer , . Then we have and
[TABLE]
Since , is decreasing as a function of on the interval , so Lemma 3.7 and (5) imply that
[TABLE]
Finally, using the induction hypothesis , we obtain
[TABLE]
∎
Corollary 3.9**.**
For every -vertex graph we have , with equality only if is a complete graph.
Proof.
Theorem 3.8 and Lemma 3.4 give us
[TABLE]
∎
Remark 3.10*.*
While there is no simple explicit formula for , it can be expressed in terms of the number of matchings in complete graphs. Every edge of the complete graph is contained in matchings, thus we have and consequently
[TABLE]
A relatively simple asymptotic formula can be provided as well. There is a straightforward bijection between matchings of and involutions of an -element set (a permutation is called an involution if it is equal to its own inverse, or equivalently if all cycles are of length or ). Thus the number of matchings of is the same as the number of involutions of an -element set, for which there is a well-known asymptotic formula (see [3, Proposition VIII.2]):
[TABLE]
It follows that
[TABLE]
as .
4. Trees
In this section, we will be concerned with trees. Our main goal is to determine the maximum and minimum of when is a tree with vertices. Let us first consider the problem of minimizing the average size of matchings. As it turns out, the minimum for trees is also the minimum for connected graphs in general.
Theorem 4.1**.**
For every connected -vertex graph, , with equality only if is a star.
Proof.
We have shown earlier that . However any other connected graph (except for the complete graph , for which ) on vertices satisfies , since it possesses matchings of size greater than , which make up for the empty set.
∎
The maximization problem requires more effort. Note that the line graph of the -vertex path is the -vertex path . This implies that the matchings of can be identified with the independent sets of . Thus, the average size of matchings of is the same as the average size of the independent sets of . A formula for this average size was determined in [1], where it was also shown that this average is in fact the minimum among trees of the same size.
Lemma 4.2**.**
The average size of matchings of the -vertex path is
[TABLE]
where is the golden ratio. In particular,
- (a)
** 2. (b)
, with equality only for . For all positive integers , we even have .
Proof.
The formula for is taken from [1] (using the aforementioned correspondence between matchings of and independent sets of ). The limit in (a) is a straightforward consequence. For (b), one only needs to note that the sign of the final term in (8) alternates, and that its absolute value is decreasing in (see also [1]). ∎
For ease of notation, we set and . Table 1 gives values of for small .
Before we prove the main result of this section, we require one more lemma:
Lemma 4.3**.**
For every tree and every vertex of , we have
[TABLE]
where denotes the degree of .
Proof.
Note first that . Since is a subgraph of , we have , hence , which proves the first inequality. The second inequality simply follows from the fact that is a subgraph of , so matchings of are also matchings of . ∎
Theorem 4.4**.**
For every tree of order that is not a path, we have the inequality , where . Consequently, the path maximizes the value of among all trees of order .
Proof.
We prove the inequality by induction on . For , there is nothing to prove since the only trees with three or fewer vertices are paths. Thus assume now that , and consider a vertex of the tree whose degree is at least (which must exist if is not a path). Denote the neighbors of by and the components of by (in such a way that is contained in ). Let be the edge between and , and be the tree obtained by removing from . We have
[TABLE]
Set and .
Assume first that . By Lemma 4.2 and the induction hypothesis, we have for all and . It follows that
[TABLE]
If , then we are done immediately. Hence we can assume that . This implies that the expression for in (9) is decreasing regarded as a function of , which means that we will need lower bounds for this quotient. So let us first find a formula for . We observe that
[TABLE]
thus
[TABLE]
Let us also find an expression for :
[TABLE]
We have to consider two different cases:
Case 1: One of the ’s is the two-vertex path . Then we can without loss of generality assume that , so that and . Let us distinguish two subcases depending on the number of other branches that are isomorphic to .
- •
At least one of the ’s is different from . We have
[TABLE]
as it was established earlier. Moreover, by Lemma 4.2 and the induction hypothesis,
[TABLE]
for all , and
[TABLE]
if is different from . Since this is the case for at least one index , it follows that
[TABLE]
Moreover, from the inequality (see Lemma 4.3) and the fact that , using Equation (10), we obtain . Hence, (9) gives us
[TABLE]
- •
All of the ’s are equal to . In this case, we can determine and explicitly (as functions of only) by means of Proposition 2.1:
[TABLE]
and
[TABLE]
thus
[TABLE]
Now one verifies easily that
[TABLE]
holds for all , completing the proof in Case 1.
Case 2: None of the ’s is a -vertex path .
By Lemma 4.3, we have , and plugging this estimate into Equation (10), we obtain
[TABLE]
Let us distinguish different cases depending on the shape of . We may assume that is the smallest branch, i.e. .
- •
If , then . It follows that
[TABLE]
Moreover, since for every by the induction hypothesis and Lemma 4.2 (and the assumption that none of the is a -vertex path), we have
[TABLE]
Since , Equation (12) gives us . Thus,
[TABLE]
- •
If , then and . In the same way as in the previous case, it follows that
[TABLE]
Since , in this case Equation (12) gives us . We obtain
[TABLE]
- •
If , then and . In the same way as before, it follows that
[TABLE]
Moreover, in this case, so using Equation (12) again, we get . Hence
[TABLE]
- •
If , then for all (by the induction hypothesis and Lemma 4.2, we have if is a path, and otherwise) and . So it follows now that
[TABLE]
Since , we have by (12). Thus,
[TABLE]
This completes the proof in the case that , so we are left with the case . We return to the representation
[TABLE]
Plugging (11) into Equation (10), we obtain
[TABLE]
Now we distinguish different cases depending on how many of the branches have one, two, three, four and five or more vertices respectively. This gives us a total of cases corresponding to the solutions of
[TABLE]
Here, stand for the number of ’s with one, two, three, and four vertices respectively, and is the number of ’s with five or more vertices. In each of the cases, we use the following explicit values and bounds. The bounds and explicit values for are obtained by an exhaustive case check, while the bounds for follow from the induction hypothesis and Lemma 4.2.
[TABLE]
[TABLE]
We can assume that the degree of is at most for every , since otherwise we can go back to the case that . Using this assumption, we have
[TABLE]
The first four statements are obtained by checking all possible cases. For the last one, we use the recursion in (11) combined with Lemma 4.3. Note first that has at most two neighbors in , since its degree in is at most . If there is only one neighbor, let be this neighbor, and set . We have
[TABLE]
Applying Lemma 4.3 to and yields (if the degree of was greater than , we could go back to the case again), thus . If there are two neighbors and , let and be the respective components of . Since , we obtain
[TABLE]
in this case, which readily proves the upper bound of in all cases. To improve the lower bound even further, we can note that one of the two trees and has more than one vertex; without loss of generality, let this be . Applying the same argument to as to , we find . Thus
[TABLE]
and we have also established the lower bound.
Next we return to the representation (13). By the induction hypothesis and Lemma 4.2, we have . As before, if , then we are done. So we may assume that
[TABLE]
Hence the expression (13) is linear and decreasing in , its maximum is attained for the smallest possible value of .
By the induction hypothesis, . This inequality is plugged into (13) along with the bounds for and . The identity (14) is used to obtain a lower bound on the quotient . All this gives us an upper bound for in each of the aforementioned cases, which can all be checked easily with a computer. The worst case happens when and , where we have the equality . As another example to illustrate the general procedure, let us consider the case that gives us the second worst estimate: it is obtained for , and . Let and both have two vertices, so that the first branch consists of four vertices. We have
[TABLE]
thus
[TABLE]
Moreover,
[TABLE]
and thus
[TABLE]
Finally, we have
[TABLE]
Putting everything together, we obtain
[TABLE]
The other cases are treated in the same fashion and give upper bounds with smaller constant terms. Thus the induction is complete. In order to complete the proof of the theorem, it only remains to prove an upper bound on . However, we already know from Lemma 4.2 that
[TABLE]
for , and . Thus for every tree with vertices other than . This completes the proof. ∎
5. Relations to other invariants
In this section, we will prove inequalities between the average matching size and other matching-related quantities associated with a graph. Let be an -vertex graph. The matching polynomial and the matching generating polynomial are defined as follows:
[TABLE]
Note that the average size of matchings in can be expressed as
[TABLE]
where is the first derivative of with respect to .
It is easy to see that . Using this relation, we can write the derivative of in terms of and its derivative.
[TABLE]
This gives us
[TABLE]
Let be the zeros of the matching polynomial ; it is well known that these zeros are real, see for example Section 8.5 in [9]. Now we can express and in terms of the zeros as follows:
[TABLE]
Therefore,
[TABLE]
Now, we can establish a relation between the average size of matchings of and the zeros of its matching polynomial.
Lemma 5.1**.**
Let be an -vertex graph and be the zeros of the matching polynomial of . Then
[TABLE]
Proof.
Using (15) and (16), and plugging in , we obtain
[TABLE]
and this simplifies to
[TABLE]
Let us rearrange the left hand side of Equation (17). We have
[TABLE]
Since the imaginary part must be [math], we get the desired result. ∎
Having established this relation, we can now prove two inequalities. The first relates the average matching size with the total number of matchings. Note that the latter is , which can be expressed in terms of the zeros as well:
[TABLE]
It is not difficult to verify that the inequality
[TABLE]
holds for all positive real numbers and . Plugging in for and summing over all yields the following result:
Proposition 5.2**.**
For every postive real number and every -vertex graph , we have
[TABLE]
In particular,
[TABLE]
We can still choose arbitrarily. Differentiating with respect to , we find that the optimal value for (that minimizes the upper bound) is . Plugging this back into the inequality, we obtain the following theorem:
Theorem 5.3**.**
For every -vertex graph, we have
[TABLE]
An alternative way to prove this theorem is to apply the inequality between the arithmetic and the geometric mean.
We conclude this section with a similar inequality involving the matching energy. This invariant is defined as follows [4]:
[TABLE]
Following an analogous approach, we can prove a relation between the average size of matchings in and the matching energy of .
Theorem 5.4**.**
For every graph ,
[TABLE]
Proof.
For all nonnegative real , we have . Therefore, by Lemma 5.1,
[TABLE]
∎
Remark 5.5*.*
Note that in the case of trees, the matching polynomial coincides with the characteristic polynomial. So we have a correspondence between the average size of matchings of a tree and the classical energy of a tree, which is the sum of the absolute values of the eigenvalues, see [8].
6. The weighted average size of matchings in a graph
In the context of the monomer-dimer model from statistical physics, one often considers a probability distribution on the set of matchings where the probability of a -matching is proportional to for some constant , see for example [2]. This provides the motivation to study the weighted average size of matchings. We consider a random matching according to the aforementioned probability distribution, where is a fixed positive number. We define the weighted total number of matchings in , the weighted total size of and the weighted average size of matchings in as follows:
[TABLE]
Following a similar reasoning as in the special case where , it is still possible to prove the following inequalities.
Theorem 6.1**.**
For every fixed positive real number and every -vertex graph , we have
[TABLE]
Moreover, for every real number and every -vertex tree , we have
[TABLE]
We refer to [11] for more details on the proof. Note that the final inequality () is not generally true for all values of . One can also express the weighted average matching size in terms of the zeros of the matching polynomial:
Lemma 6.2**.**
Let be an -vertex graph and be the zeros of the matching polynomial of . Then
[TABLE]
Finally, it is also possible again to prove inequalities that relate to other invariants. Specifically, we have the following straightforward generalization of Theorem 5.4:
Theorem 6.3**.**
For every graph and every positive real number ,
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. O. D. Andriantiana, V. Razanajatovo Misanantenaina, and S. Wagner. The average size of independent sets in a graph. European Journal of Mathematics, to appear, 2019.
- 2[2] E. Davies, M. Jenssen, W. Perkins, and B. Roberts. Independent sets, matchings, and occupancy fractions. J. Lond. Math. Soc. , 96(1):47–66, 2017.
- 3[3] P. Flajolet and R. Sedgewick. Analytic Combinatorics . Cambridge University Press, 2009.
- 4[4] I. Gutman and S. Wagner. The matching energy of a graph. Discrete Appl. Math. , 160(15):2177–2187, 2012.
- 5[5] J. Haslegrave. Extremal results on average subtree density of series-reduced trees. J. Combin. Theory, Series B , 107:26–41, 2014.
- 6[6] R. E. Jamison. On the average number of nodes in a subtree of a tree. J. Combin. Theory, Series B , 35(3):207–223, 1983.
- 7[7] R. E. Jamison. Monotonicity of the mean order of subtrees. J. Combin. Theory, Series B , 37(1):70–78, 1984.
- 8[8] X. Li, Y. Shi, and I. Gutman. Graph energy . Springer Science & Business Media, 2012.
