Dependence between Path-length and Size in Random Digital Trees
Michael Fuchs, Hsien-Kuei Hwang

TL;DR
This paper investigates the dependence between size and path length in random digital trees, revealing asymptotic independence in asymmetric cases and strong dependence with fluctuations in symmetric cases, contrasting prior results.
Contribution
It uncovers novel dependence behaviors in digital trees, showing independence or dependence based on symmetry, and extends findings to other digital tree classes.
Findings
Asymptotic independence in asymmetric digital tries.
Strong dependence with periodic fluctuations in symmetric tries.
Different correlation behaviors in various digital tree classes.
Abstract
We study the size and the external path length of random tries and show that they are asymptotically independent in the asymmetric case but strongly dependent with small periodic fluctuations in the symmetric case. Such an unexpected behavior is in sharp contrast to the previously known results on random tries that the size is totally positively correlated to the internal path length and that both tend to the same normal limit law. These two dependence examples provide concrete instances of bivariate normal distributions (as limit laws) whose correlation is , and periodically oscillating. Moreover, the same type of behaviors is also clarified for other classes of digital trees such as bucket digital trees and Patricia tries.
| Shape parameters | ||
|---|---|---|
| Size | ||
| NPL | ||
| KPL | ||
| Depth |
| trees | ||||
|---|---|---|---|---|
| tries | ||||
|
||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications · DNA and Biological Computing · Algorithms and Data Compression
Dependence between path-length and size in
random digital trees
Michael Fuchs
Department of Applied Mathematics
National Chiao Tung University
Hsinchu 300
Taiwan
Hsien-Kuei Hwang
Institute of Statistical Science
Academia Sinica
Taipei 115
Taiwan
Abstract
We study the size and the external path length of random tries and show that they are asymptotically independent in the asymmetric case but strongly dependent with small periodic fluctuations in the symmetric case. Such an unexpected behavior is in sharp contrast to the previously known results on random tries that the size is totally positively correlated to the internal path length and that both tend to the same normal limit law. These two dependence examples provide concrete instances of bivariate normal distributions (as limit laws) whose correlation is [math], and periodically oscillating. Moreover, the same type of behaviors is also clarified for other classes of digital trees such as bucket digital trees and Patricia tries.
AMS 2010 Subject Classifications. 60C05 60F05 68P05 05C05 68W40
Keywords. Random tries, covariance, total path length, Pearson’s correlation coefficient, asymptotic normality, poissonization/de-Poissonization, integral transform, contraction method.
1 Introduction
Tries are one of the most fundamental tree-type data structures in computer algorithms; see Knuth [18] and Mahmoud [19] for a general introduction. Their general efficiency depends on several shape parameters, the principal ones including the depth, the height, the size, the internal path-length (IPL), and the external path-length (EPL); see below for a more precise description of those studied in this paper. While most of these measures have been extensively investigated in the literature, we are concerned here with the question: how does the EPL depend on the size in a random trie? Surprisingly, while the pair is known to have asymptotic correlation coefficient tending to one and to have the same normal limit law after each being properly normalized (see [10, 12]), this paper aims to show that the pair exhibits a completely different behavior depending on the parameter of the underlying random bits being biased or unbiased. This is a companion paper to [2] where we clarified the dependence structure of another class of search trees in computer algorithms.
Given a sequence of binary strings (or keys), one can construct a binary trie (very similar to constructing a dictionary of binary words) as follows. If , then the trie consists of a single root-node holding the sole string; if , the root is used to direct the strings into the corresponding subtree: if the first bit of the input string is [math] (or ), then the string goes to the left (or right) subtree; strings directed to the same subtree are then processed recursively in the same manner but instead of splitting according to the first bit, the second bit of each string is then used. In this way, a binary dictionary-type tree with two types of nodes is constructed: external nodes for storing strings and internal nodes for splitting the strings; see Figure 1 for a trie of seven strings.
The random trie model we consider here assumes that each of the binary keys is an infinite sequence of independent Bernoulli bits each with success probability . Then the trie constructed from this sequence is a random trie.
We define three shape parameters in a random trie of strings:
- •
Size : the total number of internal nodes used (the circle nodes in Figure 1);
- •
IPL (or node path-length, NPL) : the sum of the distance between the root and each internal node;
- •
EPL (or key path-length, KPL) : the sum of the distance between the root and each external node.
We will use mostly NPL in place of IPL, and KPL in place of EPL, the reason being an easier comparison with the corresponding results derived for random -ary search trees in the companion paper [2]; see below for more details.
By the recursive definition and our model assumption, we have the following recurrence relations
[TABLE]
with the initial conditions for , where denotes a binomial distribution with parameters and . Also , and are independent copies of and , respectively. While many stochastic properties of these random variables are known (see Clément et al. [3], Devroye [5] and [10] and many references cited there), much less attention has been paid to their correlation and dependence structure.
The asymptotic behaviors of the moments of random variables defined on tries typically depend on the ratio being rational or irrational, where . So we introduce, similar to [10], the notation
[TABLE]
where represents a sequence of (Fourier) coefficients and when with and coprime. In simpler words, is a periodic function in the rational case, and a constant in the irrational case. We also use as a generic symbol if the exact form of the underlying sequence matters less, and in this case each occurrence may not represent the same function.
With this notation, the asymptotics of the mean and the variance are summarized in the following table; see [10, 15, 19] and the references therein for more information.
Note specially that the leading constant
[TABLE]
in the asymptotic approximation to equals zero when , implying that is not of order but of linear order in the symmetric case. This change of order can be regarded as the source property distinguishing between the dependence and independence of on .
On the other hand, we have the relation between the external path length and the depth , which is defined to be the distance between the root and a randomly chosen external node (each with the same probability). Furthermore, we also have the asymptotic equivalent when (or ), and a central limit theorem for ; see Devroye [4].
From Table 1, we see roughly that each internal node contributes to , namely, that . Indeed, it was proved in [10] that the correlation coefficient of and satisfies
[TABLE]
Such a linear correlation was further strengthened in [12], where it was proved that both random variables tend to the same normal limit law (with zero mean and unit variance)
[TABLE]
where denotes convergence in distribution. In terms of the bivariate normal law (see Tong [27]), we can write
[TABLE]
where is a singular matrix and denotes the transpose of matrix .
We show that the correlation and dependence of on are drastically different. We start with their correlation coefficient.
Theorem A**.**
The covariance of the number of internal nodes and KPL in a random trie of strings satisfies
[TABLE]
where is given in Proposition A below, and their correlation coefficient satisfies
[TABLE]
Here is a periodic function with average value .
The result (4) is to be compared with (3) (which holds for all ): the surprising difference here comes not only from the (common) distinction between and but also from the (less expected) intrinsic asymptotic nature.
Furthermore, we show that this different behavior cannot be ascribed to the weak measurability of nonlinear dependence of Pearson’s correlation coefficient because the limiting distribution also exhibits a similar dependence pattern. (For the univariate central limit theorems implied by the result below, see Jacquet and Régnier [14] where such results were first established.)
Theorem B**.**
- (i)
For , we have
[TABLE]
where denotes the identity matrix.
- (ii)
For , we have
[TABLE]
where denotes the (asymptotic) covariance matrix of and :
[TABLE]
Alternatively, we may define
[TABLE]
Then both cases can be stated in one as
[TABLE]
On the other hand, since for bivariate normal distribution, zero correlation implies independence (see [27]), it is more transparent to split the statement into two cases. See Figure 3 for (Monte Carlo) 3D-plots of the joint distributions of when .
These results are to be compared with the corresponding ones for random -ary search trees [2], and the differences for correlation coefficients are summarized in Table 2.
Furthermore, the joint distribution for -ary search trees undergoes a phase change at : if the branching factor satisfies , then the space requirement is asymptotically independent with the KPL and NPL, while for , their limiting joint distributions contain periodic fluctuations and are dependent; see [2] for more information.
The dependence phenomena as those discovered in this paper are not limited to random tries and have indeed a wider range of connections. They also appear in different forms in other structures and algorithms with an underlying binomial splitting process; see Flajolet [6] and [10, 13] for references on data structures, algorithms, conflict resolution protocols and stochastic models. A typical example is the dependence between the number of coin-tossings (or bits generated, or bits inspected) and the number of partitioning rounds in (i) CTM tree algorithm (see Rom and Sidi [25]), (ii) bucket sort (see [18] and Mahmoud et al. [20]), (iii) RS Algorithm for generating random permutations (see Bacher et al. [1]), and (iv) initializing radio networks (see Myoupo et al. [21]). We will also present the results without proof for three other classes of digital trees in the last section.
Our approach is mostly analytic and it is unknown if our results can be characterized by probabilistic arguments. Indeed, we believe that the less expected results we discovered are of special interest to probabilists as more structural interpretation or characterization remains to be clarified.
An extended abstract of this paper appeared in the online proceedings of the 27th International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (Kraków, Poland; July 4–8, 2016); see [9]. More details of the proofs, as well as a section on extensions are added in this version (some of them in an appendix). Also, we corrected the plots of Figure 3 in [9]. The extended abstract [9] was peer-reviewed and we incorporated the comments and suggestions of the referees into this version.
2 Covariance and Correlation Coefficient
In this section, we prove Theorem A on the asymptotics of the covariance and correlation coefficient of and , where we content ourselves with a detailed sketch of the method because similar proofs have been given in [10]. In fact, we will also need the variances of and , whose derivations will be recalled below and which have been known for some time; see Jacquet and Régnier [14], Kirschenhofer and Prodinger [16], Kirschenhofer et al. [17], Régnier and Jacquet [24]) and [10]. See also Table 1 for a brief summary of these results.
Our method of proof is based on the by-now standard two-stage approach relying on the theory of analytic de-Poissonization and Mellin transform whose origin can be traced back to Jacquet and Régnier [14]. See Flajolet et al. [7] for a survey on Mellin transform, and Jacquet and Szpankowski [15] for a survey on analytic de-Poissonization. For the computation of the covariance, the manipulation can be largely simplified by the additional notions of Poissonized variance and admissible functions further developed in our previous papers [10, 13].
The starting point of our analysis is the recurrence satisfied by and in (1). A standard means in the computation of moments of and is the Poisson generating function, which corresponds to the moments of and with replaced by a Poisson random variable with parameter (this step is called Poissonization).
More precisely, define the Poisson generating function of and that of :
[TABLE]
Then the recurrences (1) lead to the functional equations
[TABLE]
From these equations, we obtain, by Mellin transform techniques [7],
[TABLE]
for large in the half-plane , where denotes the entropy of Bernoulli(). Then, by Cauchy’s integral representation and analytic de-Poissonization techniques [15], we obtain precise asymptotic approximations to and to ; see [10] for more details.
Similarly, for the variances and , we introduce the Poisson generating functions of the second moments:
[TABLE]
which then satisfy, by (1), the same type of functional equations as in (5) but with different non-homogeneous parts. Instead of computing directly asymptotic approximations to the second moments, it proves computational more advantageous to consider the Poissonized variances
[TABLE]
and then following the same Mellin-de-Poissonization approach (as for the means) to derive the first and the third asymptotic estimate in the second column of Table 1; again see [10] for details.
It remains to derive the claimed estimate for the covariance. For that purpose, we introduce the Poisson generating function
[TABLE]
which satisfies, again by (1),
[TABLE]
To compute the covariance, it is beneficial to introduce now the Poissonized covariance (see (7) or [10] for similar details)
[TABLE]
which satisfies
[TABLE]
where
[TABLE]
and
[TABLE]
Note that is zero when . Furthermore, from (6) (which can be differentiated since they hold in a sector with in the complex plane), we obtain that and is exponentially small for large in . Also as . Thus the Mellin transform of exists in the strip , and we have then the inverse Mellin integral representation
[TABLE]
where denotes the Mellin transform of ; see [7].
Next, again from (6) we see that can be analytically continued to the vertical line and has no singularities there. Thus, by shifting the line of integration in (10) and computing residues, we obtain
[TABLE]
uniformly for in a sector.
What is left is the computation of the Fourier coefficients of the periodic function (see Proposition A below). This is in fact the most technical part of the proof because contains the product of the two terms and , and thus is a Mellin convolution integral. In [10], a general procedure was given for the simplification of such integrals (see [10, p. 24 et seq.]). This simplification procedure (see Appendix A for details) and a direct application of the theory of admissible functions of analytic de-Poissonization now yield the following estimate for the covariance of and .
Proposition A**.**
The covariance of and is asymptotically linear
[TABLE]
Here
[TABLE]
where denotes Euler’s constant, is the digamma function and is defined in (2).
Remark 1*.*
If , then only is needed and the second term (the sum over ) on the right-hand side of (11) has to be dropped. Also the first term here is taken to be its limit as when .
The asymptotic estimate for the correlation coefficient in Theorem A now follows from this and the results for the variances of and (see Table 1), where expressions for and can be found, e.g., in [10]. For convenience, we give below the expressions in the unbiased case. Note that both and are strictly positive; see Schachinger [26] for details.
In the symmetric case, an alternative expression to (11) (avoiding the convolution of two Fourier series) is
[TABLE]
see the discussion of the size of tries in [10], where a similar alternative expression was given for , which reads
[TABLE]
Moreover, also in [10], the following expression for can be found
[TABLE]
Note that and , and the reason of retaining in the denominator is to give a uniform expression for all (notably ). These provide an explicit expression for the periodic function in Theorem A. Also, since all the periodic functions have very small amplitude, the average value of the periodic function can be well-approximated by
[TABLE]
3 Limit Law
In this section, we prove Theorem B, part (i); the proof of part (ii) is similar and only sketched. The key tool of the proof is the multivariate version of the contraction method; see Neininger and Rüschendorf [23]. More precisely, we will use Theorem 3.1 in [23].
We first recall the expression for the square-root of a positive-definite matrix
[TABLE]
It is well-known that such a matrix has exactly one positive-definite square root which is given by
[TABLE]
with the inverse
[TABLE]
Now we give the proof of Theorem B, part (i).
Proof of Theorem B, Part (i).
Note first that
[TABLE]
where the notation is as in Section 1. The contraction method was specially developed for obtaining limiting distribution results for such recurrences; see [23].
We need some notation. First, define
[TABLE]
This matrix is clearly positive-definite for all sufficiently large. Next define
[TABLE]
and
[TABLE]
where and .
Now to apply the contraction method in [23], it suffices to show that the following conditions hold
[TABLE]
for and , where denotes convergence in the -norm, is the operator norm, denotes the characteristic function of set , and
[TABLE]
Then the contraction method in [23] guarantees that (centralized and normalized) converges in distribution to the unique fixed-point with mean [math], covariance matrix the unity matrix and finite -norm of
[TABLE]
where is an independent copy of . Obviously, the bivariate normal distribution is the solution. All this is summarized as follows.
Proposition B**.**
The following convergence in distribution holds:
[TABLE]
Proof.
We only check (15) because the second condition of (16) follows along similar lines and the first condition of (16) follows from (15) in view of
[TABLE]
We start with proving (15) for for which we use the notations
[TABLE]
and
[TABLE]
Also define
[TABLE]
Then, by (12), we see that
[TABLE]
and a similar expression for holds. From the normality of both and (proved for via the contraction method in [11] and a similar method of proof also applies to ), we have
[TABLE]
and
[TABLE]
Moreover, we have
[TABLE]
and
[TABLE]
where and are as above. Thus, both sequences are bounded and, consequently, we obtain the claimed result with -convergence above. Similarly, one proves (15) for .
Next, we consider . Here, we only show the claim for the entry of (denoted by ) all other cases being treated similarly. First, observe that by definition and matrix square-root, we have
[TABLE]
Now, from the strong law of large numbers for the binomial distribution
[TABLE]
and from Taylor series expansion (note that all periodic functions are infinitely differentiable), we have
[TABLE]
and
[TABLE]
Thus, from which the claim follows by the dominated convergence theorem.
Next, set
[TABLE]
Then, we have the following simple lemma.
Lemma 1**.**
We have, as ,
[TABLE]
Proof. This follows by a straightforward computation using the expressions for the matrix square-root (12) and its inverse (13). For example, the entry of (where we use the notations from the proof of the previous proposition) satisfies
[TABLE]
which tends to [math] as claimed. The other entries are treated similarly,
Theorem B, part (i) now follows from this lemma and Proposition B.
Next, we sketch the (similar) proof of Theorem B, part (ii).
Proof of Theorem B, Part (ii).
The proof runs along similar lines as in Part (i). The only difference is that now it is not entirely obvious that is positive definite. Note, however, that from the discussion in the introduction, this matrix is positive-definite if and only if (defined in Theorem B) is positive definite. This is ensured by the following lemma.
Lemma 2**.**
* is positive-definite for all large enough.*
Proof. It suffices to show that for all large enough. Indeed, we have
[TABLE]
from which the result follows.
Note that this in addition shows the stronger result for all large enough where . (A proof avoiding numerical computations can be performed using the same approach as in Proposition 3 of [12].)
The rest of the proof is similar as in the asymmetric case and is omitted.
4 Extensions
In this section, we show that the dependence phenomena we discovered here on random binary tries (Theorem A and Theorem B) also find their appearance in other trees and structures whose subtree-sizes and sub-structure-sizes are dictated by a binomial or a multinomial distribution.
For simplicity, we consider in this section only three varieties of random digital trees: random -ary tries, random PATRICIA tries and random bucket digital search trees; see [10] for more potential examples with the same splitting principles.
-ary Tries.
It is straightforward to extend our tries constructed from binary input strings to inputs from an -ary alphabets, . In this case, the resulting trie becomes an -ary tree (since each node now has subtrees one belonging to each letter). As a random model, we assume that bits are generated independently at random with the -th letter occurring with probability , where and for .
The size and the key path length (which we again denote by and ) in such random -ary tries satisfy the recurrences
[TABLE]
with the initial conditions for , where and are independent copies of and , respectively, for , and
[TABLE]
for all with .
The pair satisfies the same type of properties as those described in Theorem A and Theorem B for binary tries, where the symmetric case here corresponds to and all other cases are asymmetric. Only the expressions for and are different but they can be computed via the same analytic tools as those used in [10]. For the sake of simplicity, we only give the expressions in the symmetric case () as follows:
[TABLE]
Note that the variance of the size was considered in [12], but no explicit expression was given for the Fourier coefficients of the periodic function.
With the help of these expressions, we obtain the following numerical approximations to the average value of the periodic function of the correlation coefficient between and in the symmetric case. We see that they differ little.
PATRICIA Tries.
A simple idea to increase the efficiency of tries is to remove all internal nodes with one-way branching. The resulting tree is called a PATRICIA trie; here PATRICIA is an acronym of “Practical Algorithm To Retrieve Information Coded In Alphanumeric”.
We use the same random model as we used above for -ary tries and consider the size and key-path length of PATRICIA tries (which we again denote by and ). Then they satisfy the recurrences
[TABLE]
with the initial conditions for , where is defined as in (17) above, and are independent copies of and , respectively, and
[TABLE]
Note that for , the size is deterministic. We thus assume to avoid trivialities. Then the dependence of satisfies mutatis mutandis Theorem A and Theorem B. In particular, the required changes for in the symmetric case are given as follows ():
[TABLE]
These expressions are also valid for , where and can be shown to be identically zero. Note that the result for the variance of the key-path length was already derived in [10] (for ) and that for the size was established in [12] but without a precise expression for the Fourier coefficients.
Again, we can use the above expressions to obtain the average value of the periodic function of the correlation coefficient between and in the symmetric case. Note that unlike tries, these values increase with .
Bucket Digital Search Trees.
Digital search trees (DST) represent yet another class of digital tree structures; see [18, 19] for more information. In contrast to tries and PATRICIA tries, they only have one type of nodes where data are stored. More precisely, given a set of data consisting of infinite [math]- strings, a DST is constructed as follows: if , then the DST consists of only one node holding the sole string; otherwise, the first string is stored in the root and all others are directed to the subtrees according to their first bit being [math] or ; then, the subtrees are built recursively but by using consecutive bits to split the data.
Clearly, the size of such a DST is deterministic and equals the input cardinality. We consider instead a bucket version with an additional capacity , allowing each node holding up to strings and nodes having subtrees only when they are filled up.
We adopt the same Bernoulli random model as for random tries and consider the size and key-path length in random bucket digital search trees (again denoted by and ), which then satisfy
[TABLE]
with the initial conditions and , where and are independent copies of and , respectively.
The same dependence phenomena as those described in Theorem A and Theorem B also hold for the pair . The computation of the sequences is nevertheless more intricate. In the asymmetric case, one can again use analytic de-Poissonization and Mellin transform techniques, however, the resulting expressions are less explicit. On the other hand, in the symmetric case, explicit expressions for are available via the Poisson-Laplace-Mellin method from [13]. As the expressions are long, we omit them here. Note that the results for the variances of and have already been obtained in [13].
Other Shape Parameters.
Theorem A and Theorem B also extend to pairs of random variables where the size is replaced by the number of various patterns (such as the number of internal-external nodes discussed, e.g., by Flajolet and Sedgewick in [8]) and the key-path length is replaced by other notions of the path length (such as the total path length of internal-external nodes).
Acknowledgments
The first author acknowledges partial supported by MOST under the grants MOST-104-2923-M-009-006-MY3 and MOST-105-2115-M-009-010-MY2. We also thank the helpful comments by the referees for the extended abstract [9] of this paper.
Appendix A A Sketch of Proof of (11)
We sketch here some details of how the expression (11) for in Proposition A is obtained. The method we use is based on that introduced in [10]; see p. 25 et seq.
First, by moving the line of integration in (10) to the right and using the residue theorem, we have
[TABLE]
where . Note that in the above expression and in what follows, if is irrational, then only the term with is retained. Thus, our problem boils down to the computation of .
We first consider the Mellin transform of which is easier to handle. By the expression (9) for from Section 2, the Mellin transform is given by
[TABLE]
where
[TABLE]
Observe that by applying the Mellin transform and its inverse to (5), we obtain
[TABLE]
and
[TABLE]
Substituting these into the integral representation of and interchanging the integrals, we see that
[TABLE]
where the last line follows from moving the vertical line of integration to minus infinity and summing over all the residues of the poles encountered.
For , we use the expression for in Section 2, (18) and (19), and Mellin convolution, giving
[TABLE]
where the integration path is the imaginary axis with a small indentation to the right at the zeros of . Now by the decomposition
[TABLE]
the above integral is rewritten as
[TABLE]
We break now this integral into two parts according to the two terms in the bracket. For the first part, we use the substitution and standard residue calculus, and obtain
[TABLE]
where the second line follows by moving the line of integration over the imaginary axis and denotes the derivative of . Next, note that
[TABLE]
The first integral on the right-hand side is a Mellin convolution integral and can be evaluated explicitly as
[TABLE]
For the second integral, we move the line of integration to infinity and use the residue theorem, yielding
[TABLE]
In a similar way, the second part of (20) has the series representation
[TABLE]
Since
[TABLE]
we then deduce (11) by collecting all expressions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Bacher, O. Bodini, H.-K. Hwang and T.-H. Tsai (2016). Generating random permutations by coin-tossing: classical algorithms, new analysis and modern implementation, ACM Trans. Algorithms , accepted for publication.
- 2[2] H.-H. Chern, M. Fuchs, H.-K. Hwang and R. Neininger (2016). Dependencies and phase changes in random m 𝑚 m -ary search trees, Random Struct. Algor. , accepted for publication.
- 3[3] J. Clément, P. Flajolet and B. Vallée (1998). Dynamical sources in information theory: a general analysis of trie structures, Algorithmica , 29 , 307–369.
- 4[4] L. Devroye (1999). Universal limit laws for depths in random trees, SIAM J. Comput. , 28 , 409–432.
- 5[5] L. Devroye (2005). Universal asymptotics for random tries and PATRICIA trees, Algorithmica , 42 , 11–29.
- 6[6] P. Flajolet (2006). The ubiquitous digital tree. In Lecture Notes in Comput. Sci. ( STACS 2006 ), 3884 , pp. 1–22, Springer, Berlin.
- 7[7] P. Flajolet, X. Gourdon and P. Dumas (1995). Mellin transforms and asymptotics: harmonic sums, Theoret. Comput. Sci. , 144 , 3–58.
- 8[8] P. Flajolet and R. Sedgewick (1986). Digital search trees revisited, SIAM J. Comput. , 15 , 748–767.
