The Proportion of Trees that are Linear
Tanay Wakhare, Eric Wityk, Charles R. Johnson

TL;DR
This paper investigates enumeration problems related to linear trees, providing generating functions, asymptotic growth rates, and distributional properties, including a central limit theorem for the number of k-linear trees.
Contribution
It introduces new generating functions, characterizes the asymptotic growth, and proves a central limit theorem for the distribution of k-linear trees.
Findings
Derived generating functions for linear trees
Established asymptotic growth rates of nonisomorphic linear trees
Proved a central limit theorem for the distribution of k-linear trees
Abstract
We study several enumeration problems connected to linear trees, a broad class which includes stars, paths, generalized stars, and caterpillars. We provide generating functions for counting the number of linear trees on vertices, characterize the asymptotic growth rate of the number of nonisomorphic linear trees, and show that the distribution of -linear trees on vertices follows a central limit theorem.
| Total | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 25 | 56 | 22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 105 |
| 11 | 36 | 114 | 74 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 231 |
| 12 | 50 | 224 | 219 | 37 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 532 |
| 13 | 70 | 441 | 576 | 158 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1,254 |
| 14 | 94 | 733 | 1,394 | 591 | 58 | 1 | 0 | 0 | 0 | 0 | 0 | 2,872 |
| 15 | 127 | 1,252 | 3,150 | 1,896 | 304 | 9 | 0 | 0 | 0 | 0 | 0 | 6,739 |
| 16 | 168 | 2,091 | 6,733 | 5,537 | 1,342 | 82 | 1 | 0 | 0 | 0 | 0 | 15,955 |
| 17 | 222 | 3,393 | 13,744 | 14,812 | 5,085 | 508 | 11 | 0 | 0 | 0 | 0 | 37,776 |
| 18 | 288 | 5,408 | 26,969 | 37,133 | 17,232 | 2,635 | 112 | 1 | 0 | 0 | 0 | 89,779 |
| 19 | 375 | 8,440 | 51,185 | 87,841 | 53,200 | 11,523 | 804 | 12 | 0 | 0 | 0 | 213,381 |
| 20 | 480 | 12,982 | 94,323 | 198,267 | 152,316 | 44,704 | 4,730 | 145 | 1 | 0 | 0 | 507,949 |
| 21 | 616 | 19,650 | 169,453 | 429,199 | 409,105 | 156,513 | 23,451 | 1,182 | 14 | 0 | 0 | 1,209,184 |
| 22 | 781 | 29,388 | 297,533 | 896,731 | 1,040,846 | 504,869 | 102,186 | 7,862 | 184 | 1 | 0 | 2,880,382 |
| 23 | 990 | 43,394 | 512,006 | 1,814,978 | 2,526,691 | 1,517,918 | 400,074 | 43,602 | 1,682 | 15 | 0 | 6,861,351 |
| 24 | 1,243 | 63,430 | 865,050 | 3,572,810 | 5,887,488 | 4,300,385 | 1,434,484 | 211,388 | 12,381 | 226 | 1 | 16,348,887 |
| 25 | 1,562 | 91,754 | 1,437,739 | 6,858,774 | 13,231,478 | 11,567,238 | 4,773,006 | 915,546 | 75,951 | 2,288 | 17 | 38,955,354 |
| Nonlinear trees | Linear Trees | % Nonlinear | Total | |
|---|---|---|---|---|
| 10 | 1 | 105 | 0.9 | 106 |
| 11 | 4 | 231 | 1.7 | 235 |
| 12 | 19 | 532 | 3.4 | 551 |
| 13 | 47 | 1,254 | 3.6 | 1,301 |
| 14 | 287 | 2,872 | 9.1 | 3,159 |
| 15 | 1,002 | 6,739 | 12.9 | 7,741 |
| 16 | 3,365 | 15,955 | 17.4 | 19,320 |
| 17 | 10,853 | 37,776 | 22.3 | 48,629 |
| 18 | 34,088 | 89,779 | 27.5 | 123,867 |
| 19 | 104,574 | 213,381 | 32.9 | 317,955 |
| 20 | 315,116 | 507,949 | 38.3 | 823,065 |
| 21 | 935,321 | 1,209,184 | 43.6 | 2,144,505 |
| 22 | 2,743,364 | 2,880,382 | 48.8 | 5,623,756 |
| 23 | 7,966,723 | 6,681,351 | 53.7 | 14,828,074 |
| 24 | 22,951,010 | 16,348,887 | 58.4 | 39,299,897 |
| 25 | 65,681,536 | 38,955,354 | 62.8 | 104,636,890 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The Proportion of Trees that are Linear
Tanay Wakhare∗, Eric Wityk*†*, and Charles R. Johnson§
∗ University of Maryland, College Park, MD 20742, USA
† Georgia Institute of Technology, Atlanta, GA 30332, USA
§ College of William and Mary, Williamsburg, VA 23185, USA
Abstract.
We study several enumeration problems connected to linear trees, a broad class which includes stars, paths, generalized stars, and caterpillars. We provide generating functions for counting the number of linear trees on vertices, characterize the asymptotic growth rate of the number of nonisomorphic linear trees, and show that the distribution of -linear trees on vertices follows a central limit theorem.
MSC(2010): Primary: 05C30; Secondary: 05C05.
Keywords: Linear trees; Caterpillars.
§ Corresponding author
1. Introduction
A high degree vertex (HDV) in a simple undirected graph is one of degree at least . A tree is called linear if all of its HDV’s lie on a single induced path, and -linear if there are HDV’s. The linear trees include the familiar classes of paths, stars, generalized stars (g-stars, with exactly one HDV), double g-stars [4], and caterpillars [3], etc. They have become important, as all multiplicity lists of eigenvalues occurring among Hermitian matrices, whose graph is a given linear tree, may be constructed via a linear superposition principal (LSP) that respects the precise structure of the linear tree [4, 5]. For other, nonlinear trees, multiplicity lists require different methodology. For a tree to be nonlinear, there must be at least HDV’s (and at least vertices altogether). An example of a nonlinear tree and a linear tree, both on vertices, is given in Figure 1.1.
Linear trees are a substantial generalization of caterpillars, and the problem of counting the number of non-isomorphic linear trees is significantly harder than for caterpillars. We define a bivariate generating function for the number of -linear trees on vertices, which enables the fast computation of these numbers. Additionally, we are able to obtain asymptotic growth rates which show that the probability that a randomly chosen tree on vertices will be linear approaches [math] as . This shows that while the LSP is a useful characterization, it has limited applicability to studying the spectra of general trees. As increases, the LSP characterizes the spectra of an asymptotically vanishing proportion of all trees. However, the proportion of linear trees vanishes slowly, so that the LSP is very important, especially for small numbers of vertices. We conclude with an investigation of the distribution of -linear trees on vertices, and show that this satisfies a central limit theorem.
2. Generating Functions
There are strong links between nonisomorphic linear trees and partitions, which are famously difficult to enumerate. In constructing a generating function for -linear trees on vertices, we will rely the generating function for integer partitions. Let
[TABLE]
denote the generating function for , the number of unrestricted partitions of . Let be the number of reflections of linear trees on vertices with HDV’s (which counts linearly symmetric trees once and linearly asymmetric trees twice), and let denote the number of linearly symmetric trees on vertices with HDV’s. Letting denote the number of non-isomorphic -linear trees on vertices, we deduce
[TABLE]
The following generating function allows us to compute recurrences for the coefficients which allow for fast computation of .
Theorem 1**.**
The generating function for -linear trees on vertices is
[TABLE]
Proof.
First, we enumerate the nonisomorphic generalized stars on vertices. Since two g-stars are non-isomorphic if and only if the lengths of their arms differ, we notice a one-to-one correspondence between nonisomorphic generalized stars and partitions. In particular, the number of nonisomorphic g-stars on vertices is (the accounting for the designated central vertex), with each partition corresponding to a distinct set of possible arm lengths. Linear trees are formed from generalized stars on vertices, with intermediate paths of arbitrary length. Therefore, we will use the generating function for the number of non-isomorphic generalized stars on vertices, which is
[TABLE]
Let an exterior star be a generalized star at the end of the linear tree. Such stars must have a central vertex of degree , not counting the concatenating path. Therefore, there is a bijection between partitions of with parts and non-isomorphic exterior stars on vertices. Since there is only a single partition of with one part, itself, the generating function for exterior stars is
[TABLE]
Additionally, up to isomorphism, there is a unique path of length , so that the generating function for the number of paths on vertices has the form .
Therefore, the number of linear trees generated by concatenating an exterior star, interior stars, and a trailing exterior star, by paths of arbitrary length, is
[TABLE]
We now enumerate , the number of reflectionally symmetric -linear trees on vertices. These have a freely chosen central component, after which one half of the tree completely determines the other half. The component is a path when is even, and a generalized star when is odd.
If the central component is a path, it is free to have an arbitrary number of vertices, while every other component on vertices determines vertices due to reflectional symmetry. Therefore, we count the number of -linear trees which can be generated by concatenating an exterior star, interior stars, a freely chosen central path, and their reflections:
[TABLE]
We can conduct a similar analysis for a -linear tree, where the central component is instead a generalized star. We obtain the generating function
[TABLE]
Noting that and summing all three of these generating functions completes the proof. ∎
From this generating function, we can extract the number of -linear trees on vertices for small values of . Table 1 displays this information for . Note that for fixed , the distribution of -linear trees appears to have a dominant contribution at around . As a corollary of the central limit theorem of Theorem 4, we will characterize this peak exactly, as lying at .
3. Asymptotics
We wish to show that linear trees form an asymptotically small subset of all trees. Wityk [7] showed that the fraction of -linear trees on -vertices to the number of trees with high degrees vertices approaches [math] as the number of vertices tends to infinity. However, this was only for a fixed , and only partial results were shown for the natural extension to account for all linear trees. Heuristically, we expect the proportion of trees that are linear to decrease as the number of vertices increases. Given a large tree, we can color all the high degree vertices. The probability that these HDV’s all lie on a single induced path intuitively decreases as the number of vertices increases. The next theorem asymptotically proves this result, and Table 2 shows this phenomenon for small values of .
We can use standard techniques from analytic combinatorics to extract information about the number of nonisomorphic linear trees on vertices. In particular, we describe the asymptotic growth rate of the number of nonisomorphic linear trees, and show that the path length satisfies a central limit theorem. The methods in this section are all pulled from Flajolet and Sedgewick’s monumental treatise [2].
Theorem 2**.**
The number of nonisomorphic linear trees on vertices, , is asymptotically given by
[TABLE]
where is the unique real solution, , of
[TABLE]
Proof.
The proof is based on meromorphic singularity analysis. We first set in the bivariate generating function of Theorem 1, giving
[TABLE]
We know that is analytic for . Therefore, the only poles inside the unit disc arise from the denominator terms of and . Since is strictly increasing on the real line and , there is a unique real root to the equation satisfying . The denominator term of contributes a singularity at . Also, the pole at is the only pole on the circle , since if . Therefore, is the dominant singularity, in that it is the pole with smallest absolute value.
Therefore, appealing to the methods of [2, Chapter IV], we immediately have that
[TABLE]
Inn particular, note that in the language of [2, Chapter V.2, p. 294], we are dealing with a supercritical sequence with and . Hence the result follows directly from [2, Theorem V.1]. ∎
We can then obtain statistics about the number of HDV’s in a random linear tree, by conducting another singularity analysis of the generating function of Theorem 1. We will apply the moving pole analysis of Flajolet and Sedgewick. In what follows, for any function analytic at and satisfying , we set
[TABLE]
We will appeal to the following theorem to prove our main result.
Theorem 3**.**
[2, Thm IX.12 (Algebraic singularity schema)]** Let be a function that is bivariate analytic at and has non-negative coefficients. Assume also the following conditions:
- (1)
Analytic perturbation: there exist three functions analytic in a domain , such that, for some with and , the following representation holds, with ,
[TABLE]
*furthermore, assume that in , there exists a unique root of the equation , that this root is simple, and that . * 2. (2)
Non-degeneracy: one has , ensuring the existence of a non-constant analytic at , such that and . 3. (3)
Variability: one has
[TABLE]
Then the random variable with probability generating function converges in distribution to a Gaussian variable with a speed of convergence that is . The mean and variance [corrected] are asymptotically linear in .
Theorem 4**.**
Define the mean and variance
[TABLE]
For large , converges in distribution to a Gaussian distribution with mean and variance , with speed of convergence , i.e. the normalized random variable
[TABLE]
*converges in distribution to a standard normal distribution. *
Proof.
We again refer to Theorem 1, that
[TABLE]
At , Equation (3.2) reduces to
[TABLE]
By the same singularity analysis as that of Theorem 2, we see that the dominant pole occurs at again. Thus at we take as before. By inspecting Equation (3.2), we see that for sufficiently close to , the dominant singularity will arise from the term with denominator , and the other two terms will be analytic in a sufficiently small neighborhood of . We can thus appeal to Theorem 3, where we take
[TABLE]
Setting
[TABLE]
we find
[TABLE]
The non-degeneracy condition then simplifies to . Furthermore, we can solve for the local expansion of the functional equation around , by using standard series reversion techniques [2, Equation (38), p. 672] to find
[TABLE]
and thus
[TABLE]
Then, we can expand the inverse to second order, which gives
[TABLE]
Referring back to definition (3.1), we further deduce
[TABLE]
which expand to the values of and given in the statement of the theorem. Numerically, we have as expected from the numerical data, and , and the variance condition is also verified. Finally, we appeal to a general remark [2, p. 678] that the asymptotic mean and the variance of our distribution are given exactly by Equation (3.1). ∎
Acknowledgements
Part of this work was carried out by the second author as part of his honors thesis at the College of William and Mary. T.W. would also like to thank Roberto Costas-Santos for being an excellent advisor during the College of William and Mary Matrix REU, where part of this work was completed. We would also like to thank the anonymous referees, who suggested the proof of the central limit theorem, and Larry Washington, who finally found a very pernicious error in our proof of the central limit theorem.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Andrews, The theory of partitions. Addison-Wesley Publishing Co., Reading, Mass, 1976
- 2[2] P. Flajolet and R. Sedgewick, Analytic combinatorics, Cambridge University Press, Cambridge, 2009
- 3[3] F. Harary and A. J. Schwenk, The number of caterpillars, Discrete Math., 6 , 359–365, 1973
- 4[4] C. R. Johnson and C. M. Saiago, Eigenvalues, multiplicities and graphs. Cambridge University Press, Cambridge, 2018
- 5[5] C. R. Johnson, A. A. Li, and A. J. Walker, Ordered multiplicity lists for eigenvalues of symmetric matrices whose graph is a linear tree, Discrete Math., 333 , 39–55, 2014
- 6[6] R. Otter, The number of trees, Ann. of Math. (2), 44 , 583–599, 1948
- 7[7] E. Wityk, Linear and Nonlinear Trees: Multiplicity Lists of Symmetric Matrices, College of William and Mary, Undergraduate Honors Theses, Paper 113, 1–55, 2014
