
TL;DR
This paper analyzes the Zagreb index, a graph-based topological measure, across various random network models, providing mean, variance, and distributional properties for each class.
Contribution
It introduces new calculations of mean, variance, and asymptotic distributions of the Zagreb index for several classes of random networks.
Findings
Mean and variance of Zagreb index computed for each network class.
Asymptotic normality established for one class.
Right-skewed distribution shown for another class.
Abstract
In this article, we investigate the Zagreb index, a kind of graph-based topological index, of several random networks, including a class of networks extended from random recursive trees, plain-oriented recursive trees, and random caterpillars growing in a preferential attachment manner. We calculate the mean and variance of the Zagreb index for each class. In addition, we prove that the asymptotic distribution of the Zagreb index for the first class is normal, and that the asymptotic distribution of the Zagreb index for the second class is skewed to the right.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The Zagreb index of several random models
Panpan Zhang
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, U.S.A.
Abstract. In this article, we investigate the Zagreb index, a kind of graph-based topological index, of several random networks, including a class of networks extended from random recursive trees, plain-oriented recursive trees, and random caterpillars growing in a preferential attachment manner. We calculate the mean and variance of the Zagreb index for each class. In addition, we prove that the asymptotic distribution of the Zagreb index for the first class is normal, and that the asymptotic distribution of the Zagreb index for the second class is skewed to the right.
AMS subject classifications.
Primary: 90B15
Secondary: 60B10; 60F05
Key words. Combinatorial probability; Martingale; Moments; Random network; Recurrence methods; Zagreb index
1. Introduction
A topological index, in chemical graph theory, is a metric that quantifies the structure of the molecular graph of a chemical compound via a number. The Zagreb index [7] is a topological index that has found a plethora of applications in mathematical chemistry and chemoinformatics. It is best known for modeling quantitative structure-property relationship (QSPR) and quantitative structure-activity relationship (QSAR) between molecules [10]. The Zagreb index of a graph, , is the sum of the squared degrees of all the nodes in . Mathematically, it is given by
[TABLE]
where is the degree of node .
Recently, the Zagreb index of several random trees were investigated, such as random recursive trees (RRTs) [5] and -ary search trees [6]. In this article, we calculate the Zagreb index of three random structures; They are a class of networks extended from RRTs, plain-oriented recursive trees (PORTs), and a class of caterpillars growing in a preferential attachment manner.
2. Zagreb index of extended RRTs
Tree is a popular structure for data storage and sorting in computer science. A rooted tree is a tree in which there is one designated node called root. The root of a tree is usually thought of as the originator of the tree. A random recursive tree (RRT) is a non-planer rooted tree such that a node is uniformly chosen from all the nodes in the existing tree as a parent for a new child at each growth step. The children of any parent in a RRT are not ordered.
The Zagreb index of RRTs was investigated by [5]. The exact mean and variance of the index were calculated. They both increase linearly with respect to time . The asymptotic distribution of the Zagreb index (scaled by ) was proven to follow a Gaussian law. In this section, we look into the Zagreb index of a class of networks extended from RRTs. This class of networks evolve as follows. At time , there is a total of nodes that are mutually connected by edges. If there is a single node () at the initial point, it exists as an isolated node, where no self loop is considered. At each subsequent time point, we randomly choose (distinct) nodes from the existing network and connect them with a newcomer by edges. Our goal is to study the Zagreb index of this class of networks at time , denoted by . A RRT appears as a special case of this network by setting .
We enumerate all the nodes in in the following way. We label the initial nodes with distinct numbers in . Before recruiting any child, these nodes are structurally equivalent, so the order of labeling is arbitrary. The child that joins the network at time is labeled with . Thus, there is a total of nodes in . For each , let be the degree of the node labeled with . In addition, let
[TABLE]
be the Zagreb index of . Note that we will repeatedly use and with proper subscripts as node degrees and the Zagreb index for all kinds of random graphs investigated through this manuscript. In the next proposition, we calculate the expectation of , and develop a weak law as well.
Proposition 1**.**
For , the mean of the Zagreb index of is
[TABLE]
As , we have
[TABLE]
This convergence takes place in probability as well.
Proof.
Let denote the -filed generated by the history of the first stages of the network, and let denote the event indicating that nodes labeled with the indices in set are chosen as parents for the new child at time . Upon the insertion of node , an almost-sure relation between and , conditional on and is given by
[TABLE]
where is an -long subset of . We simplify the almost-sure relation to get
[TABLE]
Taking the average over all possible ’s, we obtain
[TABLE]
where the sum is not random; It is equal to \bigl{(}m_{0}(m_{0}-1)+2m(n-1)\bigr{)}. We thus can take another expectation with respect to to get a recurrence for , which is given by
[TABLE]
We solve this recurrence with the initial condition , and get the result stated in the proposition.
In what follows, we have
[TABLE]
Divide by on both sides, and let go to infinity. We obtain an converge for , as well as an in-probability convergence required for the weak law. ∎
The computation of the second moment of is based on squaring the almost-sure relation of presented in Equation (1). That is
[TABLE]
As done in the proof of Proposition 1, we tend to take the expectation with respect to , then to take another expectation with respect to , and ultimately to get a recurrence for the second moment of . Before implementing this strategy, we take the most complex term in Equation (3) out and simplify it separately as follows:
[TABLE]
The first part is simple. It is
[TABLE]
The second part is
[TABLE]
We are now ready to derive the second moment of , the result of which is presented in the next proposition.
Proposition 2**.**
For , the second moment of the Zagreb index of is
[TABLE]
Proof.
Recall the squared almost-sure relation in Equation (3). Take the expectation with respect to to get
[TABLE]
where and are two constant functions (free of ) depending only on , and . We have the exact expressions of and , but they are too lengthy to report in the manuscript. Taking another expectation with respect to and plugging in derived in Proposition 1, we obtain a recurrence for the second moment of . Solving the recurrence with initial condition , we get the result stated in the proposition. ∎
Although we only present the first two leading terms of in Proposition 2, we obtain the exact expressions of a few more terms in the calculation. These terms are needed to determine the order of the leading term of the variance of . These terms are available upon request by the readers. In the next corollary, we give the variance of , computed by taking the difference between the second moment and the squared first moment of .
Corollary 1**.**
For , the variance of the Zagreb index of is
[TABLE]
In [5], the authors proved that the variance of the Zagreb index of RRT is asymptotically equal to , which is the special case of Corollary 1 (). According to Corollary 1, we find that the variance of the Zagreb index of is linear in , and its asymptotic value does not depend on . In the next corollary, we show that converges to in -space (stronger than the convergence and the in-probability convergence presented in Proposition 1).
Corollary 2**.**
As , we have
[TABLE]
Proof.
According to the asymptotic mean and variance of , we have
[TABLE]
which completes the proof. ∎
As both the mean and the variance of are linear in , we suspect that the limiting distribution of scaled by is normal for general , not just for the class of RRTs [5]. To prove the conjecture, our strategy is to apply a Martingale Central Limit Theorem (MCLT). According to Equation (2), is not a martingale. We consider the following transformation such that the transformed array is a martingale.
Lemma 1**.**
For , the sequence
[TABLE]
is a martingale.
Proof.
Given a sequence , consider such that is a martingale. We retrieve based off the fundamental martingale property, i.e.,
[TABLE]
We thus obtain a recurrence for . We solve the recurrence with an arbitrary choice of the initial value of , e.g., , to get the result stated in the lemma. ∎
There are different forms of MCLTs listed in [8], based off different sets of conditions. We choose a MCLT that requires a conditional Lindeberg’s condition and a conditional variance condition for our proof.
Lemma 2**.**
The conditional Lindeberg’s condition is given by
[TABLE]
Proof.
By the construction of the martingale, we have
[TABLE]
The bound for the maximum degree of a node is obtained by an analog to the strong law developed in [2]. Therefore, is uniformly bounded for all . In other words, for any , there exists such that the sets are empty for all . In what follows, we conclude that converges to [math] almost surely, which is stronger than the in-probability convergence required for the condition. ∎
Lemma 3**.**
The conditional variance condition is given by
[TABLE]
where is a random variable that is either finite or converges almost surely. Particularly for our case, is equal to .
Proof.
We rewrite as follows:
[TABLE]
We calculate the three expectations in the summand one after another. The first part is
[TABLE]
where C_{3}(j,m,m_{0})=m\bigl{(}(5m+1)j+2m_{0}^{2}+(m-1)m_{0}-2(5m+1)\bigr{)}/(j+m_{0}-2). The second part is
[TABLE]
The third part is
[TABLE]
Plugging in the asymptotic values of and , we get
[TABLE]
which is stronger than the in-probability convergence required for the conditional variance condition. ∎
Theorem 1**.**
As , we have
[TABLE]
Proof.
Upon the verifications of the two conditions in Lemmata 2 and 3, we have
[TABLE]
by the MCLT. This is equivalent to the stated result in the theorem. ∎
3. Zagreb index of PORTs
In contrast to a RRT, a plain-oriented recursive tree (PORT) accounts for orders in the growth process. One simple way to interpret its evolution is that the probability that a node is chosen as a parent for a new child is proportional to its degree in the current tree. Mathematically, it is given by
[TABLE]
where indicates the event that node is chosen as a parent for the newcomer, and is the set of all nodes in the current tree. Therefore, PORTs are a class of nonuniform trees. As its evolutionary process coincides with an attractive network characteristic—preferential attachment [1], PORTs are of substantial interest in the community.
The Zagreb index of PORTs was investigated in a recent article [11], where the exact mean and variance were determined. The authors claimed that the Zagreb index of PORTs does not follow a Gaussian law as time goes to infinity by showing a numeric experiment. In this paper, we provide a more rigorous proof in support of that conjecture.
Let be a PORT at time . As one node joins the tree at each step, there is a total of nodes in the . We label these nodes with according to the time point of their appearance in the tree. Let be the degree of node at time . The Zagreb index of is given by
[TABLE]
where , again, is the degree of the node labeled with in .
Proposition 3** ([11]).**
For , we have
[TABLE]
where is the digamma function, is the Euler’s constant, and is a topological index summing the cubic degrees of all the nodes.
Based off the simulation result in [11], we suspect that the asymptotic distribution of is skewed to the right, violating the property of symmetry of normal. Therefore, it suffices to show that the skewness of is not zero; in fact, it is always negative.
In probability theory, the skewness of a random variable is defined as its standardized third central moment, i.e,
[TABLE]
in which most of the elements have already been determined, except for the third moment of . We resort to a recurrence method to calculate exactly. To construct a recurrence for , we need the results presented in the next two lemmata. The first lemma is based on a new topological index, , the sum of the degrees to the fourth power of all the nodes in , i.e., .
Lemma 4**.**
For , we have
[TABLE]
As , we have
[TABLE]
The convergence takes place in probability as well.
Proof.
Upon time (node is not yet inserted), there is a total of nodes in the current tree. In addition, the total of node degrees is . By the definition of , we have the following almost-sure relations from (right before the insertion of node ) to (right after the insertion of node ), conditional on and , the event indicating that node is chosen as a parent for node :
[TABLE]
We average Equation (4) out over to get
[TABLE]
Taking another expectation and plugging in the results of and , we obtain a recurrence for . Solving the recurrence with initial condition , we obtain the result stated in the lemma.
As , the digamma function in the first term is of order . Meanwhile, the second fraction is of order according to the Stirling’s approximation. Thus, we have converges to in -space, which also suggests a weak law for . ∎
In the second lemma, we derive the mixed moment of and ; namely, . Apparently, the variables and are not independent. Our strategy is to establish a recurrence on the expectation of the product of .
Lemma 5**.**
For , we have
[TABLE]
As , we have
[TABLE]
Proof.
By the definition of (), we have the following almost-sure relations from () to (), conditional on and :
[TABLE]
Taking the product of Equations (5) and (6), we get
[TABLE]
Averaging it out over , we then have
[TABLE]
The recurrence for is then obtained by taking another expectation with respect to and plugging in the results of , , , and . Solving the recurrence with initial condition , we obtain the solution of . The convergence follows by applying the Stirling’s approximation to . ∎
Note that the exact solution for is derived. However, it can not be written in a closed form. Instead, it is the sum of four fraction terms involving gamma functions, digamma functions, and first order polygamma functions. We thus only present several leading terms of the solution in Lemma 5 for better readability. The complete solution is available upon request.
We are now ready to derive the third moment of . We use the results from Lemmata 4 and 5 as well as those from [11].
Proposition 4**.**
For , we have
[TABLE]
As , we have
[TABLE]
Proof.
Recall the almost-sure relation for . Raising Equation (5) to the third power on both sides, we have
[TABLE]
Average it out over to get
[TABLE]
Taking expectation on both sides, and plugging in all the results of lower moments, we obtain a recurrence on the third moment of . We solve the recurrence with the initial condition , to get the stated result in the proposition, and the convergence of after it is properly scaled immediately. ∎
Similar to , we get the exact solution for . However, the solution is even more complicated than that for , involving digamma functions, Meijer G functions and nested sums which can not be simplified to closed forms. However, the leading terms that we have developed are sufficient to characterize the asymptotic behavior of the skewness of .
Theorem 2**.**
As , the distribution of is skewed to the right. Hence, it is not normal.
Proof.
Recall the definition formula for :
[TABLE]
Plugging in , and , we find that the top three leading terms (of order , and ) in the numerator are exactly canceled out, left with the highest nonzero term of order . This is the same as the order of the leading term in the denominator. We thus come up with
[TABLE]
as , and conclude that does not converge to normal asymptotically. ∎
4. Zagreb index of caterpillars
In mathematical chemistry, caterpillar is a popular model for representing the structure of benzoid hydrocarbon molecules [3, 4]. In this section, we look into a class of random caterpillars by incorporating caterpillars with randomness. The class of random caterpillars considered grow in a preferential attachment manner, as described in Section 3. More precisely, at time [math], there is a spine consisting of (fixed) nodes, which were labeled with distinct numbers in from one end to the other. At each subsequent point, a leaf is linked to one of the spine nodes with an edge, the probability being equal to its degree over the total degree of all spine nodes.
At time , we denote the structure of a random caterpillar by . We first give some graph invariants of . The total number of nodes in is , and total degree (of all nodes) is . Let be the number of leaves attached to spine node , and let be the degree of spine node . There is a instantaneous relation between and . That is for ; for . According to the evolution of , the probability that the spine node is selected for recruiting a leaf at time is
[TABLE]
The Zagreb index of is given by
[TABLE]
Note that we only account for caterpillars of in this section, as the Zagreb index of a caterpillar of (star) is deterministic; It is . In the next proposition, we derive the expectation of , and develop a weak law as well.
Proposition 5**.**
For , the mean of the Zagreb index of caterpillars is
[TABLE]
As , we have
[TABLE]
This convergence takes place in probability as well.
Proof.
We start with an almost-sure relation between and , conditional on and ; that is,
[TABLE]
This is identical to the almost-sure relation in Equation (5). Taking the expectation with respect to , we get
[TABLE]
Taking another expectation with respect to , we obtain a recurrence for :
[TABLE]
We solve the recurrence with initial condition , to get the result stated in the proposition. Both convergence and in-probability convergence of are obtained immediately. ∎
Towards the second moment of , we need the mean of a new topological index—the total of cubic degrees of all nodes in . Let us denote this index by , and we have
[TABLE]
The mean of is given in the next lemma.
Lemma 6**.**
For , the mean of is given by
[TABLE]
Proof.
We consider an almost-sure relation between and analogous to Equation (6):
[TABLE]
Taking the expectation with respect to , we get
[TABLE]
The recurrence for is obtained by taking another expectation with respect to , and by plugging the result of . We solve the recurrence for with initial condition , and obtain the stated result. ∎
We are now ready to calculate the second moment of in the next proposition.
Proposition 6**.**
For , the second moment of the Zagreb index of caterpillars is given by
[TABLE]
Proof.
We begin with squaring the almost-sure relation (between and ).
[TABLE]
We take the average over to get
[TABLE]
We take another expectation with respect to to get the recurrence for . Solving the recurrence with the initial condition , we obtain the result stated in the proposition. ∎
The variance of is the obtained immediately by taking the difference between and .
Corollary 3**.**
For , the variance of the Zagreb index of caterpillars is
[TABLE]
5. Conclusion
In this article, we investigate the Zagreb index of three random networks, a class of random graphs extended from RRTs, PORTs, and preferential attachment caterpillars. For the first class, we show that the Zagreb index scaled by is asymptomatically normal. For the second class, we prove that the asymptotic distribution of the Zagreb index is not normal by showing that the distribution is skewed to the right. For the third class, we find that the second moment and the variance of the Zagreb index have the same order. We thus conjecture that its asymptotic distribution is not normal as well.
One of the possible future work is to study the Zagreb index of more general preferential attachment networks [1], a class of networks extended from PORTs. The recurrence method seems not amenable owing to the non-uniformity of the sampling distribution. One alternative approach is to exploit the degree profile of preferential attachment networks developed in [9]. We will conduct the investigation in this direction and report the results elsewhere.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Barabási, A. and Albert, R.: Emergence of scaling in random networks. Science , 286 , (1999), 509–512. MR 2091634
- 2[2] Devroye, L. and Lu, Jiang.: The strong convergence of maximal degrees in uniform random recursive trees and dags. Random Structures Algorithms , 7 , (1995), 1–14. MR 1346281
- 3[3] El-Basil, S.: Applications of caterpillar trees in chemistry and physics. J. Math. Chem. , 1 , (1987), 153–174. MR 0906155
- 4[4] El-Basil S.: Caterpillar (Gutman) trees in chemical graph theory. In: Advances in the Theory of Benzenoid Hydrocarbons. Eds.: Gutman, I. and Cyvin S. Topics in Current Chemistry, 153 , 273–289, Springer, Berlin, Heidelberg , 1990.
- 5[5] Feng, Q. and Hu, Z.: On the Zagreb index of random recursive trees. J. Appl. Probab. , 48 , (2011), 1189–1196. MR 2896676
- 6[6] Feng, Q. and Hu, Z.: Asymptotic normality of the Zagreb index of random b 𝑏 b -ary recursive trees. Dal’nevost. Mat. Zh. , 15 , (2015), 91–101. MR 3582623
- 7[7] Gutman, I. and Trinajstić, N.: Graph theory and molecular orbitals. Total ψ 𝜓 \psi -electron energy of alternant hydrocarbons. Chem. Phys. Lett. , 17 , (1972), 535–538.
- 8[8] Hall, P. and Heyde, C.: Martingale limit theory and its application. Academic Press, Inc., New York-London , 1980. xii+308 pp. MR 0624435
