Log-minor distributions and an application to estimating mean subsystem entropy
Alice C. Schwarze, Philip S. Chodrow, and Mason A. Porter

TL;DR
This paper analyzes the distribution of log-determinants of principal submatrices of covariance matrices with bounded condition number, providing bounds that enable efficient estimation of subsystem entropy regardless of system size.
Contribution
It introduces bounds on the distribution of minors and their variance, enabling accurate entropy estimation with sample sizes independent of system size.
Findings
Sample size for entropy estimation is asymptotically independent of system size n.
Number of samples needed scales linearly with subsystem size k.
Derived bounds improve efficiency of entropy estimation in large systems.
Abstract
A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix of a system of coupled variables, the covariance matrices of the subsystems are principal submatrices of . The rapid growth with of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix. Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use…
| Example | Variance bound from | |||||
|---|---|---|---|---|---|---|
| Theorem 1 | Theorem 2 | Theorem 3 | ||||
| 1 | E1 | |||||
| E2 | ||||||
| E3 | N/A | |||||
| E4 | N/A | |||||
| 5 | E1 | |||||
| E2 | ||||||
| E3 | N/A | |||||
| E4 | N/A | |||||
| 10 | E1 | |||||
| E2 | ||||||
| E3 | N/A | |||||
| E4 | N/A | |||||
| 19 | E1 | |||||
| E2 | ||||||
| E3 | N/A | |||||
| E4 | N/A | |||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Statistical Methods and Inference · Neural Networks and Applications
Log-minor distributions and an application to estimating mean subsystem entropy
Alice C. Schwarze
Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
Philip S. Chodrow
Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139
Mason A. Porter
Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, California 90095, USA
AMS 2010 Subject classification: 15B99, 15A15, 60E15, 93A10
Keywords: empirical distributions, determinants, sampling error, positive-definite matrices, random matrices
Abstract
A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix of a system of coupled variables, the covariance matrices of the subsystems are principal submatrices of . The rapid growth with of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix.
Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use these in turn to derive upper bounds on the standard error of the sample mean of subsystem entropy. Our results demonstrate that, despite the rapid growth of the set of subsystems with , the number of samples that are needed to bound the sampling error is asymptotically independent of . Instead, it is sufficient to increase the number of samples in linear proportion to to achieve a desired sampling accuracy.
1 Introduction
In many fields of study, researchers use matrices to represent systems of interest. Statisticians and data scientists represent large tabular data sets as matrices [1]. In network science, it is common to use adjacency matrices to represent the structure of a network [2]. In dynamical systems, researchers use Jacobian matrices in the study of the linearized dynamics of a system of coupled variables [3]. For networks, dynamical systems, statistical analysis of large data sets, and other applications, it can be insightful (and even necessary) to examine their components (as subnetworks, subsystems, reduced data sets, and so on). Several researchers have used subsystem properties to characterize robustness and other salient properties of dynamical systems [4, 5, 6, 7, 8, 9]. Network scientists count and analyze motifs and other subgraphs in networks to characterize a network’s structure [10, 11]. Several prominent tools in data science are based on linear sketching, an approach to data dimensionality reduction whereby one obtains a reduced data set via matrix multiplication [12, 13] or as a linear combination of submatrices [14, 15, 16]. An example of such a tool for dimensionality reduction is principal component analysis [17].
The various applications of submatrices motivate the mathematical study of their properties. In this paper, we study the distribution of log-determinants of principal submatrices of a positive definite matrix and show that our results lead to controllable sampling guarantees for computing the mean differential entropy of subsystems for a dynamical system. Researchers have studied the differential entropy of subsystems in areas such as physics [18, 19], biology [8, 9], neuroscience [4, 5, 6], computer science [7], and coding theory [20]. For example, Tononi et al. (1999) computed a measure of network redundancy from the mean differential entropy of its subsystems of fixed size [5]. Teschendorff et al. (2014) [21] used differential entropy to define a measure of network robustness for protein-interaction networks .
For several symmetric multivariate distributions, estimates of differential entropy are affine functions of the log-determinant of a system’s covariance matrix. Examples include the multivariate normal distribution [22], the multivariate distribution [23, 24], and the multivariate Cauchy distribution [24]. For the -variate normal distribution with covariance matrix , for example, the differential entropy is [22]
[TABLE]
where the base of the logarithm can be any finite positive number111The base of the logarithm determines the units of entropy. If one chooses , one measures entropy values in bits. If one chooses , one measures entropy values in nats.. The logarithm of the covariance matrix is thus sufficient to approximate the differential entropy of several multivariate distributions.
The principal submatrices of are covariance matrices of subsystems that correspond to subsets of coupled variables. One can compute the differential entropy of a subsystem by computing in Eq. 1 for a principal submatrix of . A system of coupled variables possesses subsystems of variables; each of these subsystems corresponds to one of the principal submatrices of . The exact computation of the distribution of differential subsystem entropy or its moments thus requires one to compute distinct determinants, an infeasible task for large and . This task can be computationally prohibitive even for modestly-sized systems. To our knowledge, the largest system for which researchers have exactly computed the differential entropy of subsystems is a synthetic network with variables and subsystems with variables [5].
To address this problem, we study the distribution of log-determinants of principal submatrices. We refer to these log-determinants as log-minors of size . As we noted above, these log-determinants are sufficient to determine the subsystem entropy for many important multivariate distributions. Knowledge of the properties of this distribution thus enables the derivation of bounds on the sampling error when estimating subsystem entropy in many applications. We show that, given a bound on the condition number of , the standard error of a sample mean of differential entropy is independent of and sublinear in , implying that one needs a sublinear number of samples in to ensure a desired accuracy.
Our paper proceeds as follows. In Section 2, we introduce some notation that we use throughout this paper. In Section 3, we give several upper bounds on the tail and variance of the distribution of log-minors of a positive-definite matrix with bounded condition number. We present proofs for these bounds in Section 4 and show numerical examples in Section 5. In Section 6, we apply our theorems to provide probabilistic guarantees on the sample mean and relative error, and we discuss implications for the design of practical schemes for estimating mean subsystem entropy. We conclude and discuss possible extensions in Section 7.
2 Notation
Let be a positive-definite matrix. Let be the eigenvalues of . Because is positive definite, it is also nonsingular; its condition number is . For a given index set , the matrix is the corresponding principal submatrix of . For any fixed , let denote the set of all such submatrices of , and let denote a uniformly-random element of this set. We define a random variable and denote its empirical distribution by . For convenience, we define .
3 Bounds on the distribution of log-minors
In this section, we state bounds on the distribution for a positive-definite matrix with bounded condition number. We give upper bounds for the distribution’s support, variance, and right tail. We also show that we can improve these bounds if is diagonal.
Theorem 1** (Tail and variance bound for log-minors of a positive-definite matrix).**
Let be a positive-definite matrix with condition number . For every , we have
[TABLE]
Furthermore, the variance of satisfies
[TABLE]
Remark 1*.*
The tail bound in Eq. 2 does not guarantee that concentrates222Ledoux defined concentration of measure in Ref. [25] (on page 3) as follows. Let be a metric space with probability measure on Borel sets of . The concentration function is defined as , where and is the open -neighborhood of . The measure has normal concentration on if there are constants and such that for every . on . This is because the bound in Eq. 2 is increasing with respect to and asymptotically constant with respect to . Indeed, for the bound to approach [math] for a sequence of matrices, it is both necessary and sufficient that . Because cannot be smaller than , this condition requires the condition number to approach . The condition severely constrains the sequence . In that limit, all eigenvalues of are equal to each other and all log-minors are equal to .
Theorem 2** (Support and variance bound for log-minors of a positive-definite matrix).**
Let be a positive-definite matrix with condition number . For any , the random variable and its distribution satisfy the following properties:
The distribution has bounded support that is contained in an interval whose length is no greater than ; and 2. 2.
the variance of satisfies
[TABLE]
Remark 2*.*
The variance bound in Eq. 4 is much sharper than the one in Eq. 3. Both variance bounds are asymptotically constant with respect to . For fixed , the two variance bounds differ by a factor of in the large- limit.
Remark 3*.*
For even and , the bound on the variance in Eq. 4 is sharp when is a (where ) diagonal matrix with entries and .
When is diagonal, we can derive a variance bound that is sharper than the bounds in Eqs. 3 and 4.
Theorem 3** (Variance bound for log-minors of a positive-definite diagonal matrix).**
Let be a positive-definite diagonal matrix with condition number . The variance of satisfies
[TABLE]
Remark 4*.*
The two variance bounds in Eqs. 4 and 5 are asymptotically constant with respect to and converge to the same limiting value of .
Remark 5*.*
The variance bound for diagonal positive-definite matrices in Eq. 5 is sharper than the variance bound for general positive-definite matrices in Eq. 4. The former differs from the latter by a factor of .
Remark 6*.*
For even and any , the bound on the variance in Eq. 5 is sharp when is a (where ) diagonal matrix with entries and . The sharpness of the bound for diagonal matrices indicates a limit to possible improvements for the variance bound for general positive-definite matrices. Specifically, one cannot hope to improve the variance bound in Eq. 4 by more than a factor of .
The variance bound in Theorem 2 is sharp for a diagonal matrix. This observation and several examples in Section 5 motivate the following conjecture.
Conjecture 1** (Diagonal matrices maximize log-minor variance).**
Let be the set of positive-definite matrices with condition number . For all , , and , there exists a diagonal matrix such that
[TABLE]
The variance bounds (see Eqs. 4, 3 and 5) have important implications for the accuracy of sample means of log-minors. We discuss these implications in Section 6.
4 Proofs of bounds on the distribution of log-minors
4.1 Proof of Eq. 3
To prove Eq. 3, we use Cauchy’s interlacing theorem and results on Markov chains on countable sets. Chatterjee and Ledoux (2009) previously used this approach to prove a concentration result for empirical cumulative eigenvalue spectra of Hermitian matrices [26].
Proposition 1** (Cauchy’s interlacing theorem [27]).**
Let be a Hermitian matrix, and let be a principal submatrix of . If has eigenvalues and has eigenvalues , then
[TABLE]
Proposition 2** (Large-deviation inequality for functions on countable sets [25] (page 50)).**
Let be a reversible Markov chain on a finite or countable set . Let have a spectral gap333The spectral gap (also called the “Poincaré constant”) of a Markov chain on a space is the constant such that, for all functions , we have . See, for example, Ref. [25] (page 50). of . It follows that , whenever is a function such that
[TABLE]
it is also true that is integrable with respect to and that, for every , the probability measure
[TABLE]
Remark 7*.*
The expected squared distance in between and its adjacent states in the Markov chain is . One can thus think of as a measure of the expected squared distance between the greatest “outlier” and adjacent states in . We thus refer to as the squared outlier deviation of on .
Proposition 3** (Spectral gap of random-transposition walk [28]).**
Let be the set of permutations of elements, and let . Let the “random-transposition walk” be a reversible Markov chain with kernel
[TABLE]
The random-transposition walk has a spectral gap of .
- Proof of Eq. 3.
Every principal submatrix of is the top-left principal submatrix of after a permutation of its rows and columns. We denote the permutated matrix by and its top-left principal submatrix by , where is a permutation of elements.
For the top-left principal submatrix, only the first elements of are relevant. There are permutations that are identical in their first elements, so there is a -to- correspondence between and . Because of the correspondence between and , we obtain the same distribution for a function , where we choose uniformly at random from , and for , where we choose uniformly at random from .
Let be such that
[TABLE]
for some . To find an upper bound on the squared outlier deviation for on the random-transposition walk, we make two observations:
Consider two permutations, and , that are adjacent in the random-transposition walk; that is, for some transposition . The determinant is invariant under basis transformation, so the value of can differ from only if is a transposition that swaps one of the first elements in with one of the last elements in . There are possible transpositions for a sequence of elements; and of these transpositions swap one of the first elements of the sequence with one of the last elements of the sequence. Consequently, the fraction of transpositions that change the value of has an upper bound of
[TABLE] 2. 2.
Using Cauchy’s interlacing theorem (see 1), one can find an upper bound for . For any and any pair , there exists a matrix such that and are principal submatrices of . Cauchy’s interlacing theorem implies that
- (a)
is an upper bound on the largest eigenvalue of ; 2. (b)
is a lower bound on the smallest eigenvalue of ; 3. (c)
is an upper bound on and ; and 4. (d)
is a lower bound on and .
Therefore,
[TABLE]
This upper bound for holds for arbitrary . We can thus set the upper bound to be .
We obtain an upper bound for the squared outlier deviation of of
[TABLE]
Let . The function on has a squared outlier deviation of . We can thus use the tail bound for functions on countable sets (see 2) for . Therefore,
[TABLE]
We can substitute in Eq. 7 by , because of the correspondence between and . Applying 3, we obtain
[TABLE]
This proves the first statement of Eq. 3 (see Eq. 2).
We derive a bound on the variance of from Eq. 2 from a direct calculation. First, we write
[TABLE]
Using the tail bound in Eq. 2, it follows that
[TABLE]
∎
4.2 Proof of Theorem 2
We prove Theorem 2 using Cauchy’s interlacing theorem and Popoviciu’s inequality.
Proposition 4** (Popoviciu’s inequality [29, 30]).**
Let be a real-valued random variable supported on the interval . It then follows that has variance
[TABLE]
For a proof of this version of Popoviciu’s inequality, see Ref. [31].
- Proof of Theorem 2.
For any finite and , the set of principal submatrices of an matrix has finite cardinality . It follows that the distribution of any function of has finite support. We define an interval with and , such that the support of is a finite subset of .
We can obtain any principal submatrix of by removing row–column pairs from . Successive applications of Cauchy’s interlacing theorem show that and . It follows that
[TABLE]
Therefore,
[TABLE]
If , any two principal submatrices share rows and columns. They can thus differ in at most rows and columns. It follows that one can refine the lower and upper bounds on the support of so that . We have thus proven the first part of Theorem 1. Applying Popiviciu’s inequality to with yields the variance bound in Theorem 2. ∎
4.3 Proof of Eq. 5
For our proof of Eq. 5, we maximize with respect to the eigenvalues of .
- Proof of Eq. 5.
Let be a positive-definite diagonal matrix with entries . Define for each ; and let . It then follows that
[TABLE]
We now consider the function . From Eq. 8, we see that every value of is a sum of a subset of the variables . Therefore, the function is convex (i.e., concave up) in the variables . Furthermore, the variance is translation-invariant. We may therefore, without loss of generality, suppose that (corresponding to ) and (corresponding to ). Consequently, the maximization of the variance amounts to the maximization of over the volume associated with an -dimensional hypercube with edge length . The solutions lie at the vertices of this hypercube. Therefore,
[TABLE]
for some . We may now view as a hypergeometric random variable on a population of size for which elements have the value and elements have the value [math]. The variance of this hypergeometric random variable is
[TABLE]
which is maximal at
[TABLE]
The maximal value of for even leads to the variance bound
[TABLE]
Comparing the maximal values of for even and for odd shows that Eq. 9 is a variance bound for all . ∎
5 Examples
In this section, we compare the tail of the distribution for several example matrices to the bounds in Theorems 3 and 5.
We consider four examples of positive-definite matrices with and fixed condition number .
Example E1.
Consider the diagonal matrix that maximizes the variance of . (See the proof of Eq. 5.) For even , this matrix has eigenvalues , where and .
Example E2.
Consider a diagonal matrix with eigenvalues . We set and . We draw from a uniform distribution on .
Example E3.
We obtain a non-diagonal positive-definite matrix with condition number via an orthogonal transformation of . That is,
[TABLE]
where is an orthogonal matrix that we choose from the Haar measure over the group of orthogonal matrices. We use Stewart’s algorithm [32] to generate .
Example E4.
We again generate a random orthogonal matrix using Stewart’s algorithm. We obtain another non-diagonal positive-definite matrix via an orthogonal transformation of .
In Figure 1, we show the empirical probability densities of for Examples E1, E2, E3, and E4 using four different values of . For all four examples, we observe that the interval on which is supported shifts to the right for progressively larger . The length of the supported interval increases with . For and — the cases in which is larger than — the distribution are almost symmetric about for all four examples. For Example E1, the distribution is symmetric about its mean for all examined values of . Its density is nonzero at equidistant points.
In Table 1, we show and for the distributions in Figure 1. We first consider the expectation of . For all four examples, we observe that increases with . For all examined values of , we see that . Our observations thus suggest that the expectation of is large when we choose eigenvalues of uniformly at random from the interval and small when we set half of the eigenvalues of to and the other half to .
We now give several observations about the variance of . For all examined values of , we see that . Our observation of larger for the examples with diagonal matrices (Examples E1 and E2) than for the examples with non-diagonal matrices (Examples E3 and E4) gives intuitive support for Eq. 6. Our observation that reflects the fact that Example E1 maximizes the variance in this case (see Eq. 5).
For all examined , the value of the variance bound in Eq. 3 (see Eq. 3) is at least 12 times larger than the value of the variance bound in Theorem 2 (see Eq. 4). For and , the cases in which , the value of the variance bound in Theorem 2 is equal to the value of the variance bound for diagonal positive-definite matrices (Eq. 5). Additionally, it is sharp in Example E1.
In Fig. 2, we show the empirical tails for our four examples. We also show the tail bound B1 from Eq. 3 and two Chebyshev bounds444We can obtain a tail bound from a variance bound by using Chebyshev’s inequality [33] (page 429), , for an integrable random variable and ., B2 and B3, which we obtain from the variance bounds in Theorems 2 and 5, respectively. Consistent with our observations in Table 1 on , we observe that the tail probability tends to be larger for the examples with diagonal matrices (Examples E1 and E2) than for the examples with non-diagonal matrices (Examples E3 and E4).
The difference in functional form guarantees that the bound B1 intersects with the Chebyshev bound B2 at two values of . If we denote these values by and , the bound B1 is sharper than B2 on and . In our observations, both bounds exceed the trivial bound on . The value lies outside the support of . We thus see that B1 is sharper than B2 only for values of for which neither bound is informative.
For and , the bounds B2 and B3 coincide and are sharp at when . For and , the bound (B3) for diagonal positive-definite matrices is sharper than the bound (B2) for general positive-definite matrices. The difference between the two bounds is most visible for , which is the case that maximizes .
6 Estimating mean subsystem entropy
We now consider the implications of our results in Section 3 for the problem of estimating the mean subsystem entropy of a given system of coupled variables. When the joint distribution of variables is a multivariate normal distribution, one can compute the differential entropy of a subsystem by applying Eq. 1 to the corresponding sub-covariance matrix. We are interested in the mean subsystem entropy for subsystems of variables. As we noted previously, the large number of subsystems for even modest values of and render it prohibitive to exactly compute . Fortunately, the tail and variance bounds in Section 3 allow us to instead provide sampling guarantees, through which one can achieve a prescribed sampling accuracy. We give upper bounds on the standard error and on the coefficient of variation for both a sample mean of and a sample mean of subsystem entropy.
Fix a subsystem size and sample size . The -sample mean of is
[TABLE]
where we choose each uniformly at random from . The -sample mean of subsystem entropy is
[TABLE]
We use and as estimators of the population means and , respectively. These estimators are unbiased, as and . A measure of reliability of an estimator is the standard error, which one computes as the estimator’s standard deviation. Because differs from by a constant, the sample mean has the standard error
[TABLE]
We may therefore use the bounds of Eqs. 3, 2 and 5 to derive bounds on the standard error for and .
Corollary 1** (Standard error of the sample mean subsystem entropy).**
Let be a covariance matrix of an -variate normal distribution, and suppose that the condition number of satisfies . Let be the -sample mean of the entropy of subsets of variables; and let be the -sample mean of log-determinants of principal submatrices of . It then follows, for any subsystem size , that the standard error of the mean subsystem entropy is and that satisfies
[TABLE]
and
[TABLE]
Furthermore, if is diagonal,
[TABLE]
The coefficient of variation is another measure of reliability for estimators. It measures the size of the typical error of an estimator as a fraction of the magnitude of . As a formula, it is given by
[TABLE]
The coefficient of variation for arises from the standard deviation of the relative error
[TABLE]
of because
[TABLE]
For a multivariate Gaussian distribution, the following corollaries give bounds on the coefficient of variation for the sample mean of log-minors and for the sample mean of subsystem entropy.
Corollary 2** (Coefficient of variation for a sample mean of log-minors).**
Let be the eigenvalues of ; we order them from largest to smallest. Let . If , the coefficient of variation for a -sample mean of satisfies
[TABLE]
and
[TABLE]
- Proof.
This corollary follows from Eq. 14. We use Eqs. 3 and 4 as upper bounds on the numerator. For all , a lower bound on the denominator is . ∎
Corollary 3** (Coefficient of variation for mean subsystem entropy).**
For an -variate Gaussian distribution with covariance matrix , the coefficient of variation for a -sample mean of subsystem entropy satisfies
[TABLE]
and
[TABLE]
- Proof.
We derive this result from Eq. 14; we use Eqs. 3 and 4 to bound the numerator, and we use Eq. 1 to bound the expectation in the denominator. ∎
The bounds in Eqs. 16 and 18 are sharper bounds than Eqs. 15 and 17. From Eqs. 16 and 18, we see that both and decay in proportion to . Indeed, under a certain regularity condition (which we specify in Corollary 4), the coefficient of variation decays to [math] in the limit of large and large .
Corollary 4** (Concentration of the relative error).**
Let be a sequence of positive-definite matrices of dimension . Let be a function of . Suppose that the sequence
[TABLE]
is nondecreasing and unbounded. It then follows that and converges in probability to [math] as becomes large.
Remark 8*.*
A sufficient condition for the concentration of is that the sequence has fixed condition number and the smallest eigenvalue is bounded away from both [math] and . Formally, the latter condition is
[TABLE]
Remark 9*.*
A popular model for sample covariance matrices is the Wishart ensemble555The Wishart ensemble with scale matrix and degrees of freedom is the ensemble of random matrices , where the are realizations of an -variate random variable with 0-mean Gaussian distribution [34, 35].. A sequence of Wishart matrices can satisfy the condition in Eq. 20 if the ratio of the number of variables and the number of degrees of freedom is [35, 36].
One can use these bounds on the standard error to choose a sample size that guarantees a desired accuracy of a sample mean. In Fig. 3, we show our bounds on the standard error and the coefficient of variation of and with and . In the left panels, we show the bounds B1 from Eq. 11 and B2 from Eq. 12 on the standard error of and . In the right panels, we show the bounds B1’ (see Eqs. 15 and 17) and B2’ (see Eqs. 16 and 18) on the coefficient of variation of and . In panels (A) and (B), we vary the system size for fixed subsystem size . We observe for that the values of the bounds B2 and B2’ increase with . For , the bounds B2 and B2’ are independent of . The bounds B1 and B1’ are less sharp than the bounds B2 and B2’. The values of B1 and B1’ increase with and approach their asymptotic values from below. For example, the bound B1 for has a limiting value of . In panels (C) and (D), we vary for fixed . We observe for that the value of the bound B2 on the standard error is independent of . For , the value of B2 decreases with increasing and is [math] for . The bound B1 is less sharp than B2. Its value decreases with increasing for any . The values of the bounds B1’ and B2’ on the coefficient of variation decrease with increasing subsystem size and are [math] for . In panels (E) and (F), we vary for fixed ratio , and we observe that the bounds on the standard error are independent of if the ratio is constant. The values of the bounds B1’ and B2’ decrease with increasing . This is consistent with our previous observation that vanishes if the sequence (see Eq. 19) becomes unbounded.
It is important to note that all of our bounds on the standard error and the coefficient of variation of and are asymptotically constant in . It is thus not necessary to sample proportionally more minors from a larger matrix. Instead, to guarantee a desired accuracy of a sample mean of log-minors or subsystem entropy, one can choose to be a function of . To ensure that the standard error is constant or decreases with growing and , it is sufficient to choose in linear proportion to . When the smallest and largest eigenvalues of a system’s correlation matrix are fixed, one can ensure that the coefficient of variation is constant or decreasing with growing and by choosing in linear proportion to .
7 Conclusions
We examined the problem of estimating the mean subsystem entropy of a system of coupled variables with covariance matrix . When the joint distribution of a system’s variables is an -variate Gaussian, , or Cauchy distribution, the mean differential entropy of subsystems is an affine function of the log-minors of the covariance matrix [22, 24]. We derived tail and variance bounds on the distribution of log-minors of fixed size of a positive-definite matrix with bounded condition number. Using our variance bounds, we provided upper bounds on the standard error and on the coefficient of variation of both the sample mean of log-minors and the sample mean of subsystem entropy. Our results indicate that, despite the rapid growth of the number of subsystems with , the accuracy of these sample means is asymptotically independent of a system’s size. Instead, it is sufficient to increase the number of samples in linear proportion to the size of subsystems to achieve a desired sampling accuracy.
Our results are salient to studies that use mean subsystem entropy to examine systems of coupled variables [20, 4, 5]. Even for a system with as few as variables, sampling just 0.001% of its subsystem entropies can require the computation of over a billion log-determinants. Using the largest and smallest eigenvalues of a system’s covariance matrix to determine the number of samples that are needed to achieve a prescribed accuracy for a sample mean can thus facilitate a quantitative study of mean subsystem entropy when it would otherwise be impossible.
Throughout our paper, we relied only on knowledge of the largest and smallest eigenvalues of a system’s covariance matrix. We expect that it is possible to derive sharper bounds than our current results when one knows the complete spectrum of a system’s covariance matrix, likely by relying on Cauchy’s interlacing theorem (1) to control the log-minors.
We presented two bounds on the variance of a log-minor that we choose uniformly at random from the set of log-minors of size of an positive-definite matrix. The variance bound in Theorem 2 is sharper than the one in Eq. 3, but either bound is sufficient to deduce that the accuracy of a sample mean of subsystem entropy is asymptotically independent of a system’s size and that one can achieve a prescribed accuracy by choosing the number of samples in linear proportion to the size of subsystems.
The proof of our first bound (see Section 4.1) relies on the existence of an upper bound for the difference between for two different principal submatrices and the invariance of under a basis transformation of . The proof of our second bound (see Section 4.2) relies on the existence of an upper bound and a lower bound for the support of the distribution .
Similar bounds and the invariance under basis transformation hold for several other matrix properties, including the largest and smallest eigenvalues. It is thus plausible that one can derive similar results for the standard error and coefficient of variation for many spectral properties of principal submatrices. For example, Chatterjee and Ledoux (2009) proved a large-deviation inequality for the empirical cumulative eigenvalue distribution of principal submatrices of Hermitian matrices [26]. These and other variance and tail bounds on submatrix properties offer welcoming possibilities to enhance computational studies that characterize complex systems based on the mean properties of their subsystems. For example, they can provide guarantees for linear sketching techniques, which are relevant for data dimensionality reduction. They can also facilitate the use of methods of spectral graph analysis in the study of subgraphs, graphlets, and motifs in networks.
Acknowledgements
We thank Clément Canonne, Kameron Decker Harris, Michael Neely, and participants of the IPAM Quantitative Linear Algebra Tutorials for helpful discussions. A.C.S. was supported by the Clarendon Fund, e-Therapeutics plc, and funding from the Engineering and Physical Sciences Research Council under grant number EP/L016044/1. P.S.C. was supported by the National Science Foundation under Graduate Research Fellowship Grant 1122374.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Skiena [2017] S. S. Skiena. The Data Science Design Manual . Springer, Cham, Switzerland, 2017.
- 2Newman [2018] M. E. J. Newman. Networks . Oxford University Press, Oxford, United Kingdom, 2018.
- 3Strogatz [2018] S. H. Strogatz. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering . Westview Press, Boulder, CO, USA, 2018.
- 4Tononi et al. [1994] G. Tononi, O. Sporns, and G. M. Edelman. A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proceedings of the National Academy of Sciences of the United States of America , 91(11):5033–5037, 1994.
- 5Tononi et al. [1999] G. Tononi, O. Sporns, and G. M. Edelman. Measures of degeneracy and redundancy in biological networks. Proceedings of the National Academy of Sciences of the United States of America , 96(6):3257–3262, 1999.
- 6De Lucia et al. [2005] M. De Lucia, M. Bottaccio, M. Montuori, and L. Pietronero. Topological approach to neural complexity. Physical Review E , 71(1):016114, 2005.
- 7Randles et al. [2011] M. Randles, D. Lamb, E. Odat, and A. Taleb-Bendiab. Distributed redundancy and robustness in complex systems. Journal of Computer and System Sciences , 77(2):293–304, 2011.
- 8Li et al. [2012] Y. Li, G. Dwivedi, W. Huang, M. L. Kemp, and Y. Yi. Quantification of degeneracy in biological systems for characterization of functional interactions between modules. Journal of Theoretical biology , 302:29–38, 2012.
