Log-minor distributions and an application to estimating mean subsystem   entropy

Alice C. Schwarze; Philip S. Chodrow; and Mason A. Porter

arXiv:1901.09456·math.PR·January 30, 2019

Log-minor distributions and an application to estimating mean subsystem entropy

Alice C. Schwarze, Philip S. Chodrow, and Mason A. Porter

PDF

Open Access

TL;DR

This paper analyzes the distribution of log-determinants of principal submatrices of covariance matrices with bounded condition number, providing bounds that enable efficient estimation of subsystem entropy regardless of system size.

Contribution

It introduces bounds on the distribution of minors and their variance, enabling accurate entropy estimation with sample sizes independent of system size.

Findings

01

Sample size for entropy estimation is asymptotically independent of system size n.

02

Number of samples needed scales linearly with subsystem size k.

03

Derived bounds improve efficiency of entropy estimation in large systems.

Abstract

A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$ . The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix. Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal $k \times k$ submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use…

Tables1

Table 1. Table 1: Expectation and variance for Y k ( M ) subscript 𝑌 𝑘 𝑀 Y_{k}(M) for Examples E1, E2, E3, and E4. For comparison, we show the numerical values of the variance bounds from Eq. 3 (see Eq. 3 ) and Theorem 2 (see Eq. 4 ). For the examples with diagonal matrices (E1 and E2), we also show the numerical value of the variance bound from Eq. 5 (see Eq. 5 ).

$k$	Example	$𝔼 [Y_{k} (M)]$	$var (Y_{k} (M))$	Variance bound from
$k$	Example	$𝔼 [Y_{k} (M)]$	$var (Y_{k} (M))$	Theorem 1	Theorem 2	Theorem 3
1	E1	$0.549$	$0.302$	$6.880$	$0.302$	$0.302$
	E2	$0.689$	$0.115$	$6.880$	$0.302$	$0.302$
	E3	$0.683$	$0.021$	$6.880$	$0.302$	N/A
	E4	$0.739$	$0.005$	$6.880$	$0.302$	N/A
5	E1	$2.747$	$1.191$	$27.156$	$1.509$	$1.191$
	E2	$3.446$	$0.454$	$27.156$	$1.509$	$1.191$
	E3	$3.283$	$0.091$	$27.156$	$1.509$	N/A
	E4	$3.649$	$0.020$	$27.156$	$1.509$	N/A
10	E1	$5.493$	$1.588$	$36.208$	$3.017$	$1.588$
	E2	$6.893$	$0.605$	$36.208$	$3.017$	$1.588$
	E3	$6.213$	$0.128$	$36.208$	$3.017$	N/A
	E4	$7.176$	$0.031$	$36.208$	$3.017$	N/A
19	E1	$10.437$	$0.302$	$6.880$	$0.302$	$0.302$
	E2	$13.096$	$0.115$	$6.880$	$0.302$	$0.302$
	E3	$10.570$	$0.022$	$6.880$	$0.302$	N/A
	E4	$13.155$	$0.008$	$6.880$	$0.302$	N/A

Equations91

h (M) = \frac{1}{2} lo g (det M) + \frac{n}{2} (1 + lo g (2 π)),

h (M) = \frac{1}{2} lo g (det M) + \frac{n}{2} (1 + lo g (2 π)),

Pr (∣ Y_{k} (M) - \mathbbm E [Y_{k} (M)] ∣ \geq r) \leq 3 exp (- \frac{r}{lo g κ ~} \frac{n}{k ( n - k )}) .

Pr (∣ Y_{k} (M) - \mathbbm E [Y_{k} (M)] ∣ \geq r) \leq 3 exp (- \frac{r}{lo g κ ~} \frac{n}{k ( n - k )}) .

var (Y_{k} (M)) \leq 6 (\frac{k ( n - k )}{n}) (lo g \tilde{κ})^{2} .

var (Y_{k} (M)) \leq 6 (\frac{k ( n - k )}{n}) (lo g \tilde{κ})^{2} .

var (Y_{k} (M)) \leq \frac{1}{4} (\land_{n, k} \times lo g \tilde{κ})^{2} .

var (Y_{k} (M)) \leq \frac{1}{4} (\land_{n, k} \times lo g \tilde{κ})^{2} .

var (Y_{k} (D)) \leq \frac{k}{4} (\frac{n - k}{n - 1}) (lo g \tilde{κ})^{2} .

var (Y_{k} (D)) \leq \frac{k}{4} (\frac{n - k}{n - 1}) (lo g \tilde{κ})^{2} .

var (Y_{k} (M)) \leq var (Y_{k} (D)) .

var (Y_{k} (M)) \leq var (Y_{k} (D)) .

λ_{1} (M) \geq λ_{1} (A) \geq λ_{2} (M) \geq λ_{2} (A) \geq λ_{3} (M) \geq \dots \geq λ_{n - 1} (M) \geq λ_{n - 1} (A) \geq λ_{n} (M) .

λ_{1} (M) \geq λ_{1} (A) \geq λ_{2} (M) \geq λ_{2} (A) \geq λ_{3} (M) \geq \dots \geq λ_{n - 1} (M) \geq λ_{n - 1} (A) \geq λ_{n} (M) .

∣∣∣ f ∣∣ ∣_{\infty}^{2} := \frac{1}{2} A \in A sup B \in A \sum (f (A) - f (B))^{2} Π (A, B) \leq 1,

∣∣∣ f ∣∣ ∣_{\infty}^{2} := \frac{1}{2} A \in A sup B \in A \sum (f (A) - f (B))^{2} Π (A, B) \leq 1,

μ ({f \geq \int_{A} f d μ_{A} + r}) \leq 3 e^{- \frac{r}{2} \overline{λ}} .

μ ({f \geq \int_{A} f d μ_{A} + r}) \leq 3 e^{- \frac{r}{2} \overline{λ}} .

Π = ⎩ ⎨ ⎧ 1/ n, 2/ n^{2}, 0, if s^{'} = s, if s^{'} = τ s for some transposition τ, otherwise .

Π = ⎩ ⎨ ⎧ 1/ n, 2/ n^{2}, 0, if s^{'} = s, if s^{'} = τ s for some transposition τ, otherwise .

f_{α} (s) := α lo g (det \hat{A}_{k} (s M))

f_{α} (s) := α lo g (det \hat{A}_{k} (s M))

b_{1} := \frac{2 k}{n ^{2}} (n - k) .

b_{1} := \frac{2 k}{n ^{2}} (n - k) .

∣ lo g (det A) - lo g (det B) ∣ \leq

∣ lo g (det A) - lo g (det B) ∣ \leq

\leq

∣∣∣ f_{α} ∣∣ ∣_{\infty}^{2}

∣∣∣ f_{α} ∣∣ ∣_{\infty}^{2}

= k (n - k) (\frac{α lo g κ ~}{n})^{2} .

Pr (∣ f_{α^{'}} (\hat{A}_{k} (σ M)) - \mathbbm E [f_{α^{'}} (A_{k} (σ M)] ∣ \geq α^{'} r) \leq 3 e^{- \frac{α ^{'} r}{2} \overline{λ}} .

Pr (∣ f_{α^{'}} (\hat{A}_{k} (σ M)) - \mathbbm E [f_{α^{'}} (A_{k} (σ M)] ∣ \geq α^{'} r) \leq 3 e^{- \frac{α ^{'} r}{2} \overline{λ}} .

Pr (∣ α^{'} Y_{k} (M) - α^{'} \mathbbm E [Y_{k} (M)] \geq α^{'} r)

Pr (∣ α^{'} Y_{k} (M) - α^{'} \mathbbm E [Y_{k} (M)] \geq α^{'} r)

⟹ Pr (∣ Y_{k} (M) - \mathbbm E [Y_{k} (M)] ∣ \geq r)

var (Y_{k} (M))

var (Y_{k} (M))

= \int_{0}^{\infty} Pr (Y_{k} (M) - \mathbbm E [Y_{k} (M)])^{2} \geq u) d u

= \int_{0}^{\infty} Pr (Y_{k} (M) - \mathbbm E [Y_{k} (M)]) \geq u} d u .

var (Y_{k} (M))

var (Y_{k} (M))

= 6 (\frac{k ( n - k )}{n}) (lo g \tilde{κ})^{2}

var (X) \leq (x_{max} - x_{min})^{2} /4 .

var (X) \leq (x_{max} - x_{min})^{2} /4 .

r_{1} = k lo g λ_{n} (M), r_{2} = k lo g λ_{1} (M) .

r_{1} = k lo g λ_{n} (M), r_{2} = k lo g λ_{1} (M) .

r_{2} - r_{1} \leq k (lo g \tilde{κ} + lo g λ_{n} (M)) - k lo g λ_{n} (M) = k lo g \tilde{κ} .

r_{2} - r_{1} \leq k (lo g \tilde{κ} + lo g λ_{n} (M)) - k lo g λ_{n} (M) = k lo g \tilde{κ} .

lo g (det D_{I})

lo g (det D_{I})

x_{i} = {lo g κ (D), 0, i \leq ℓ, otherwise,

x_{i} = {lo g κ (D), 0, i \leq ℓ, otherwise,

var (Y_{k} (D)) = \frac{k ( n - k )}{n ^{2} ( n - 1 )} (n - ℓ) ℓ (lo g κ (D))^{2},

var (Y_{k} (D)) = \frac{k ( n - k )}{n ^{2} ( n - 1 )} (n - ℓ) ℓ (lo g κ (D))^{2},

ℓ^{*} = {\frac{n}{2}, \frac{n \pm 1}{2}, n even, n odd .

ℓ^{*} = {\frac{n}{2}, \frac{n \pm 1}{2}, n even, n odd .

var (Y_{k} (D))

var (Y_{k} (D))

\leq \frac{k ( n - k )}{4 ( n - 1 )} (lo g \tilde{κ})^{2} .

M_{E3} := Q^{- 1} M_{E1} Q,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Statistical Methods and Inference · Neural Networks and Applications

Full text

Log-minor distributions and an application to estimating mean subsystem entropy

Alice C. Schwarze

Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK

Philip S. Chodrow

Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139

Mason A. Porter

Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, California 90095, USA

AMS 2010 Subject classification: 15B99, 15A15, 60E15, 93A10

Keywords: empirical distributions, determinants, sampling error, positive-definite matrices, random matrices

Abstract

A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$ . The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix.

Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal $k\times k$ submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use these in turn to derive upper bounds on the standard error of the sample mean of subsystem entropy. Our results demonstrate that, despite the rapid growth of the set of subsystems with $n$ , the number of samples that are needed to bound the sampling error is asymptotically independent of $n$ . Instead, it is sufficient to increase the number of samples in linear proportion to $k$ to achieve a desired sampling accuracy.

1 Introduction

In many fields of study, researchers use matrices to represent systems of interest. Statisticians and data scientists represent large tabular data sets as matrices [1]. In network science, it is common to use adjacency matrices to represent the structure of a network [2]. In dynamical systems, researchers use Jacobian matrices in the study of the linearized dynamics of a system of coupled variables [3]. For networks, dynamical systems, statistical analysis of large data sets, and other applications, it can be insightful (and even necessary) to examine their components (as subnetworks, subsystems, reduced data sets, and so on). Several researchers have used subsystem properties to characterize robustness and other salient properties of dynamical systems [4, 5, 6, 7, 8, 9]. Network scientists count and analyze motifs and other subgraphs in networks to characterize a network’s structure [10, 11]. Several prominent tools in data science are based on linear sketching, an approach to data dimensionality reduction whereby one obtains a reduced data set via matrix multiplication [12, 13] or as a linear combination of submatrices [14, 15, 16]. An example of such a tool for dimensionality reduction is principal component analysis [17].

The various applications of submatrices motivate the mathematical study of their properties. In this paper, we study the distribution of log-determinants of principal submatrices of a positive definite matrix and show that our results lead to controllable sampling guarantees for computing the mean differential entropy of subsystems for a dynamical system. Researchers have studied the differential entropy of subsystems in areas such as physics [18, 19], biology [8, 9], neuroscience [4, 5, 6], computer science [7], and coding theory [20]. For example, Tononi et al. (1999) computed a measure of network redundancy from the mean differential entropy of its subsystems of fixed size [5]. Teschendorff et al. (2014) [21] used differential entropy to define a measure of network robustness for protein-interaction networks .

For several symmetric multivariate distributions, estimates of differential entropy are affine functions of the log-determinant of a system’s covariance matrix. Examples include the multivariate normal distribution [22], the multivariate $t$ distribution [23, 24], and the multivariate Cauchy distribution [24]. For the $n$ -variate normal distribution with covariance matrix $M$ , for example, the differential entropy is [22]

[TABLE]

where the base of the logarithm can be any finite positive number111The base $b$ of the logarithm determines the units of entropy. If one chooses $b=2$ , one measures entropy values in bits. If one chooses $b=e$ , one measures entropy values in nats.. The logarithm of the covariance matrix is thus sufficient to approximate the differential entropy of several multivariate distributions.

The principal submatrices of $M$ are covariance matrices of subsystems that correspond to subsets of coupled variables. One can compute the differential entropy of a subsystem by computing $h$ in Eq. 1 for a principal submatrix of $M$ . A system of $n$ coupled variables possesses $\binom{n}{k}\approx\frac{n^{k}}{k!}$ subsystems of $k$ variables; each of these subsystems corresponds to one of the $\binom{n}{k}$ principal $k\times k$ submatrices of $M$ . The exact computation of the distribution of differential subsystem entropy or its moments thus requires one to compute $O(n^{k})$ distinct determinants, an infeasible task for large $n$ and $k$ . This task can be computationally prohibitive even for modestly-sized systems. To our knowledge, the largest system for which researchers have exactly computed the differential entropy of subsystems is a synthetic network with $n=12$ variables and subsystems with $k\leq 12$ variables [5].

To address this problem, we study the distribution of log-determinants of principal $k\times k$ submatrices. We refer to these log-determinants as log-minors of size $k$ . As we noted above, these log-determinants are sufficient to determine the subsystem entropy for many important multivariate distributions. Knowledge of the properties of this distribution thus enables the derivation of bounds on the sampling error when estimating subsystem entropy in many applications. We show that, given a bound on the condition number of $M$ , the standard error of a sample mean of differential entropy is independent of $n$ and sublinear in $k$ , implying that one needs a sublinear number of samples in $k$ to ensure a desired accuracy.

Our paper proceeds as follows. In Section 2, we introduce some notation that we use throughout this paper. In Section 3, we give several upper bounds on the tail and variance of the distribution of log-minors of a positive-definite matrix with bounded condition number. We present proofs for these bounds in Section 4 and show numerical examples in Section 5. In Section 6, we apply our theorems to provide probabilistic guarantees on the sample mean and relative error, and we discuss implications for the design of practical schemes for estimating mean subsystem entropy. We conclude and discuss possible extensions in Section 7.

2 Notation

Let $M\in\mathbbm{R}^{n\times n}$ be a positive-definite matrix. Let $\lambda_{1}(M)\geq\lambda_{2}(M)\cdots\geq\lambda_{n}(M)\geq 0$ be the eigenvalues of $M$ . Because $M$ is positive definite, it is also nonsingular; its condition number is $\kappa(M)=\lambda_{1}(M)/\lambda_{n}(M)$ . For a given index set $I\in[n]^{k}:=\{1,\ldots,n\}^{k}$ , the matrix $M_{I}:=[M_{i,j}]_{i,j\in I}$ is the corresponding principal submatrix of $M$ . For any fixed $k\leq n$ , let $\mathcal{A}_{k}(M)$ denote the set of all such $k\times k$ submatrices of $M$ , and let $A_{k}(M)$ denote a uniformly-random element of this set. We define a random variable $Y_{k}(M):=\log(\det A_{k}(M))$ and denote its empirical distribution by $\mu_{M,k}$ . For convenience, we define $\wedge_{n,k}:=\min\{k,n-k\}$ .

3 Bounds on the distribution of log-minors

In this section, we state bounds on the distribution $\mu_{M,k}$ for a positive-definite matrix $M$ with bounded condition number. We give upper bounds for the distribution’s support, variance, and right tail. We also show that we can improve these bounds if $M$ is diagonal.

Theorem 1 (Tail and variance bound for log-minors of a positive-definite matrix).

Let $M$ be a positive-definite $n\times n$ matrix with condition number $\kappa(M)\leq\tilde{\kappa}$ . For every $r\geq 0$ , we have

[TABLE]

Furthermore, the variance of $Y_{k}(M)$ satisfies

[TABLE]

*Remark 1**.*

The tail bound in Eq. 2 does not guarantee that $Y_{k}(M)$ concentrates222Ledoux defined concentration of measure in Ref. [25] (on page 3) as follows. Let $(X,d)$ be a metric space with probability measure $\mu$ on Borel sets of $(X,d)$ . The concentration function is defined as $\alpha_{(X,d,\mu)}:=\sup\{1-\mu(A_{r});A\subset X,\mu(A)\geq\frac{1}{2}\}$ , where $r>0$ and $A_{r}:=\{x\in X;d(x,A)<r\}$ is the open $r$ -neighborhood of $A$ . The measure $\mu$ has normal concentration on $(X,d)$ if there are constants $c$ and $C$ such that $\alpha_{(X,d,\mu)}\leq C\exp(-cr^{2})$ for every $r$ . on $\mathcal{A}_{k}(M)$ . This is because the bound in Eq. 2 is increasing with respect to $k$ and asymptotically constant with respect to $n$ . Indeed, for the bound to approach [math] for a sequence $\{M_{i}\}$ of matrices, it is both necessary and sufficient that $\lim_{i\rightarrow\infty}\sqrt{k_{i}}\log\kappa_{i}(M)\rightarrow 0$ . Because $k$ cannot be smaller than $1$ , this condition requires the condition number $\kappa_{i}(M)$ to approach $1$ . The condition $\lim_{i\rightarrow\infty}\kappa_{i}(M)\rightarrow 1$ severely constrains the sequence $\{M_{i}\}$ . In that limit, all eigenvalues of $M$ are equal to each other and all log-minors are equal to $k\log\lambda_{1}$ .

Theorem 2 (Support and variance bound for log-minors of a positive-definite matrix).

Let $M$ be a positive-definite $n\times n$ matrix with condition number $\kappa(M)\leq\tilde{\kappa}$ . For any $k$ , the random variable $Y_{k}(M)$ and its distribution $\mu_{M,k}$ satisfy the following properties:

The distribution $\mu_{M,k}$ has bounded support that is contained in an interval whose length is no greater than $(\wedge_{n,k}\times\log\tilde{\kappa})$ ; and 2. 2.

the variance of $Y_{k}(M)$ satisfies

[TABLE]

*Remark 2**.*

The variance bound in Eq. 4 is much sharper than the one in Eq. 3. Both variance bounds are asymptotically constant with respect to $n$ . For fixed $k$ , the two variance bounds differ by a factor of $24$ in the large- $n$ limit.

*Remark 3**.*

For even $n$ and $k\in\{1,n-1\}$ , the bound on the variance in Eq. 4 is sharp when $M$ is a $2\ell\times 2\ell$ (where $\ell=n/2$ ) diagonal matrix with entries $\lambda_{1}=\cdots=\lambda_{\ell}=\tilde{\kappa}$ and $\lambda_{\ell+1}=\cdots=\lambda_{2\ell}=1$ .

When $M$ is diagonal, we can derive a variance bound that is sharper than the bounds in Eqs. 3 and 4.

Theorem 3 (Variance bound for log-minors of a positive-definite diagonal matrix).

Let $D$ be a positive-definite $n\times n$ diagonal matrix with condition number $\kappa(D)\leq\tilde{\kappa}$ . The variance of $Y_{k}(D)$ satisfies

[TABLE]

*Remark 4**.*

The two variance bounds in Eqs. 4 and 5 are asymptotically constant with respect to $n$ and converge to the same limiting value of $k(\log\tilde{\kappa})^{2}/4$ .

*Remark 5**.*

The variance bound for diagonal positive-definite matrices in Eq. 5 is sharper than the variance bound for general positive-definite matrices in Eq. 4. The former differs from the latter by a factor of $\max\{k,n-k\}/(n-1)\leq 1$ .

*Remark 6**.*

For even $n$ and any $k\leq n$ , the bound on the variance in Eq. 5 is sharp when $M$ is a $2\ell\times 2\ell$ (where $\ell=n/2$ ) diagonal matrix with entries $\lambda_{1}=\cdots=\lambda_{\ell}=\tilde{\kappa}$ and $\lambda_{\ell+1}=\cdots=\lambda_{2\ell}=1$ . The sharpness of the bound for diagonal matrices indicates a limit to possible improvements for the variance bound for general positive-definite matrices. Specifically, one cannot hope to improve the variance bound in Eq. 4 by more than a factor of $\max\{k,n-k\}/(n-1)$ .

The variance bound in Theorem 2 is sharp for a diagonal matrix. This observation and several examples in Section 5 motivate the following conjecture.

Conjecture 1 (Diagonal matrices maximize log-minor variance).

Let $\mathcal{M}_{\kappa}$ be the set of positive-definite $n\times n$ matrices with condition number $\kappa$ . For all $k<n$ , $\kappa$ , and $M\in\mathcal{M}_{\kappa}$ , there exists a diagonal matrix $D\in\mathcal{M}_{\kappa}$ such that

[TABLE]

The variance bounds (see Eqs. 4, 3 and 5) have important implications for the accuracy of sample means of log-minors. We discuss these implications in Section 6.

4 Proofs of bounds on the distribution of log-minors

4.1 Proof of Eq. 3

To prove Eq. 3, we use Cauchy’s interlacing theorem and results on Markov chains on countable sets. Chatterjee and Ledoux (2009) previously used this approach to prove a concentration result for empirical cumulative eigenvalue spectra of Hermitian matrices [26].

Proposition 1 (Cauchy’s interlacing theorem [27]).

Let $M$ be a Hermitian $n\times n$ matrix, and let $A$ be a principal $(n-1)\times(n-1)$ submatrix of $M$ . If $M$ has eigenvalues $\lambda_{1}(M)\geq\lambda_{2}(M)\geq\dots\geq\lambda_{n}(M)$ and $A$ has eigenvalues $\lambda_{1}(A)\geq\lambda_{2}(A)\geq\dots\geq\lambda_{n-1}(A)$ , then

[TABLE]

Proposition 2 (Large-deviation inequality for functions on countable sets [25] (page 50)).

Let $(\Pi,\mu_{\mathcal{A}})$ be a reversible Markov chain on a finite or countable set $\mathcal{A}$ . Let $(\Pi,\mu_{\mathcal{A}})$ have a spectral gap333The spectral gap (also called the “Poincaré constant”) of a Markov chain $(\Pi,\mu_{\mathcal{A}})$ on a space $\mathcal{A}$ is the constant $\overline{\lambda}$ such that, for all functions $f$ , we have $\overline{\lambda}\times\operatorname{var}_{\mu}(f)\leq\frac{1}{2}\sum_{A,B\in\mathcal{A}}[f(A)-f(B)]^{2}\Pi(A,B)\mu(\{A\})$ . See, for example, Ref. [25] (page 50). of $\overline{\lambda}>0$ . It follows that , whenever $f:\mathcal{A}\rightarrow\mathbbm{R}$ is a function such that

[TABLE]

it is also true that $f$ is integrable with respect to $\mu_{\mathcal{A}}$ and that, for every $r\geq 0$ , the probability measure

[TABLE]

*Remark 7**.*

The expected squared distance in $f$ between $A\in\mathcal{A}$ and its adjacent states in the Markov chain is $\sum_{B\in\mathcal{A}}\left(f(A)-f(B)\right)^{2}\Pi(A,B)$ . One can thus think of $|||f|||_{\infty}^{2}$ as a measure of the expected squared distance between the greatest “outlier” $A$ and adjacent states in $\mathcal{A}$ . We thus refer to $|||f|||_{\infty}^{2}$ as the squared outlier deviation of $f$ on $(\Pi,\mu_{\mathcal{A}})$ .

Proposition 3 (Spectral gap of random-transposition walk [28]).

Let $\mathcal{S}_{n}$ be the set of permutations of $n$ elements, and let $s\in\mathcal{S}_{n}$ . Let the “random-transposition walk” be a reversible Markov chain $(\Pi,\mu_{\mathcal{S}_{n}})$ with kernel

[TABLE]

The random-transposition walk has a spectral gap of $\overline{\lambda}=2/n$ .

Proof of Eq. 3.

Every principal $k\times k$ submatrix of $M$ is the top-left principal $k\times k$ submatrix of $M$ after a permutation of its rows and columns. We denote the permutated matrix by $sM$ and its top-left principal $k\times k$ submatrix by $\hat{A}_{k}(sM)$ , where $s\in\mathcal{S}_{n}$ is a permutation of $n$ elements.

For the top-left principal $k\times k$ submatrix, only the first $k$ elements of $s$ are relevant. There are $(n-k)!$ permutations $s\in\mathcal{S}_{n}$ that are identical in their first $k$ elements, so there is a $1$ -to- $(n-k)!$ correspondence between $\mathcal{A}_{k}(M)$ and $\mathcal{S}_{n}$ . Because of the correspondence between $\mathcal{A}_{k}(M)$ and $\mathcal{S}_{n}$ , we obtain the same distribution for a function $f(A_{k}(M))$ , where we choose $A_{k}(M)$ uniformly at random from $\mathcal{A}_{k}(M)$ , and for $f(\hat{A}_{k}(sM))$ , where we choose $s$ uniformly at random from $\mathcal{S}_{n}$ .

Let $f_{\alpha}:\mathcal{S}_{n}\rightarrow\mathbbm{R}$ be such that

[TABLE]

for some $\alpha\in\mathbbm{R}$ . To find an upper bound on the squared outlier deviation for $f_{\alpha}$ on the random-transposition walk, we make two observations:

Consider two permutations, $s$ and $s^{\prime}$ , that are adjacent in the random-transposition walk; that is, $s^{\prime}=\tau s$ for some transposition $\tau$ . The determinant is invariant under basis transformation, so the value of $f_{\alpha}(s^{\prime})$ can differ from $f_{\alpha}(s)$ only if $\tau$ is a transposition that swaps one of the first $k$ elements in $s$ with one of the last $n-k$ elements in $s$ . There are $n^{2}$ possible transpositions for a sequence of $n$ elements; and $2k(n-k)$ of these transpositions swap one of the first $k$ elements of the sequence with one of the last $n-k$ elements of the sequence. Consequently, the fraction of transpositions that change the value of $f_{\alpha}$ has an upper bound of

[TABLE] 2. 2.

Using Cauchy’s interlacing theorem (see 1), one can find an upper bound $b_{2}$ for $|f_{\alpha}(A)-f_{\alpha}(B)|$ . For any $k<n$ and any pair $A,B\in\mathcal{A}_{k}(M)$ , there exists a matrix $C\in A_{k+1}(M)$ such that $A$ and $B$ are principal submatrices of $C$ . Cauchy’s interlacing theorem implies that

(a)

$\lambda_{1}(M)$ is an upper bound on the largest eigenvalue of $C$ ; 2. (b)

$\lambda_{n}(M)$ is a lower bound on the smallest eigenvalue of $C$ ; 3. (c)

$\sum_{i=1}^{k}\log\lambda_{i}(C)$ is an upper bound on $\log(\det{A})$ and $\log(\det{B})$ ; and 4. (d)

$\sum_{i=2}^{k+1}\log\lambda_{i}(C)$ is a lower bound on $\log(\det{A})$ and $\log(\det{B})$ .

Therefore,

[TABLE]

This upper bound for $|\log(\det{A})-\log(\det{B})|$ holds for arbitrary $A,B\in\mathcal{A}_{k}(M)$ . We can thus set the upper bound to be $b_{2}=\alpha\log\tilde{\kappa}$ .

We obtain an upper bound for the squared outlier deviation of $f_{\alpha}$ of

[TABLE]

Let $\alpha^{\prime}=n\log\tilde{\kappa}/\sqrt{k(n-k)}$ . The function $f_{\alpha^{\prime}}$ on $(\Pi,\mu_{\mathcal{S}_{n}})$ has a squared outlier deviation of $|||f_{\alpha^{\prime}}|||^{2}_{\infty}\leq 1$ . We can thus use the tail bound for functions on countable sets (see 2) for $f_{\alpha^{\prime}}$ . Therefore,

[TABLE]

We can substitute $f_{\alpha^{\prime}}(\hat{A}_{k}(\sigma M))$ in Eq. 7 by $\alpha^{\prime}Y_{k}(M)$ , because of the correspondence between $\mathcal{A}_{k}(M)$ and $\mathcal{S}_{n}$ . Applying 3, we obtain

[TABLE]

This proves the first statement of Eq. 3 (see Eq. 2).

We derive a bound on the variance of $\log(\det{A})$ from Eq. 2 from a direct calculation. First, we write

[TABLE]

Using the tail bound in Eq. 2, it follows that

[TABLE]

∎

4.2 Proof of Theorem 2

We prove Theorem 2 using Cauchy’s interlacing theorem and Popoviciu’s inequality.

Proposition 4 (Popoviciu’s inequality [29, 30]).

Let $X$ be a real-valued random variable supported on the interval $[x_{\mathrm{min}},x_{\mathrm{max}}]$ . It then follows that $X$ has variance

[TABLE]

For a proof of this version of Popoviciu’s inequality, see Ref. [31].

Proof of Theorem 2.

For any finite $n$ and $k$ , the set $\mathcal{A}_{k}(M)$ of principal $k\times k$ submatrices of an $n\times n$ matrix $M$ has finite cardinality $\binom{n}{k}$ . It follows that the distribution of any function of $A_{k}(M)$ has finite support. We define an interval $[r_{1},r_{2}]$ with $r_{1}:=\min Y_{k}(M)$ and $r_{2}:=\max Y_{k}(M)$ , such that the support of $\mu_{M,k}$ is a finite subset of $[r_{1},r_{2}]$ .

We can obtain any principal $k\times k$ submatrix $A$ of $M$ by removing $n-k$ row–column pairs from $M$ . Successive applications of Cauchy’s interlacing theorem show that $\lambda_{1}(A)\leq\lambda_{1}(M)$ and $\lambda_{k}(A)\geq\lambda_{n}(M)$ . It follows that

[TABLE]

Therefore,

[TABLE]

If $k>n/2$ , any two principal $k\times k$ submatrices share $2k-n$ rows and columns. They can thus differ in at most $n-k$ rows and columns. It follows that one can refine the lower and upper bounds on the support of $\mu_{M,k}$ so that $[r_{1},r_{2}]\leq\wedge_{n,k}\times\log\tilde{\kappa}$ . We have thus proven the first part of Theorem 1. Applying Popiviciu’s inequality to $X=Y_{k}(M)$ with $x_{\operatorname{max}}-x_{\operatorname{min}}\leq r_{2}-r_{1}$ yields the variance bound in Theorem 2. ∎

4.3 Proof of Eq. 5

For our proof of Eq. 5, we maximize $\operatorname{var}(Y_{k}(D))$ with respect to the eigenvalues of $D$ .

Proof of Eq. 5.

Let $D$ be a positive-definite diagonal matrix with entries $\lambda_{1}(D)\geq\lambda_{2}(D)\geq\dots\geq\lambda_{n}(D)>0$ . Define $x_{i}=\log\lambda_{i}(D)$ for each $i\in[n]$ ; and let $I\in[n]^{k}$ . It then follows that

[TABLE]

We now consider the function $v(x_{1},x_{2},\ldots,x_{n}):=\operatorname{var}(Y_{k}(D))$ . From Eq. 8, we see that every value of $Y_{k}(D)$ is a sum of a subset of the variables $x_{i}$ . Therefore, the function $v(x_{1},x_{2},\ldots,x_{n})$ is convex (i.e., concave up) in the variables $x_{1},x_{2},\ldots,x_{n}$ . Furthermore, the variance is translation-invariant. We may therefore, without loss of generality, suppose that $x_{n}=0$ (corresponding to $\lambda_{n}(D)=1$ ) and $x_{1}=\log\kappa(D)$ (corresponding to $\lambda_{n}(D)=\kappa(D)$ ). Consequently, the maximization of the variance $\operatorname{var}(Y_{k}(D))$ amounts to the maximization of $v$ over the volume $[0,\log\kappa(D)]^{n}$ associated with an $n$ -dimensional hypercube with edge length $\log\kappa(D)$ . The solutions lie at the vertices of this hypercube. Therefore,

[TABLE]

for some $\ell\in[n]$ . We may now view $Y_{k}(D)/(\log\kappa(D))$ as a hypergeometric random variable on a population of size $n$ for which $\ell$ elements have the value $1$ and $n-l$ elements have the value [math]. The variance of this hypergeometric random variable is

[TABLE]

which is maximal at

[TABLE]

The maximal value of $\operatorname{var}(Y_{k}(D))$ for even $n$ leads to the variance bound

[TABLE]

Comparing the maximal values of $\operatorname{var}(Y_{k}(D))$ for even $n$ and for odd $n$ shows that Eq. 9 is a variance bound for all $n$ . ∎

5 Examples

In this section, we compare the tail of the distribution $\mu_{M,k}$ for several example matrices $M$ to the bounds in Theorems 3 and 5.

We consider four examples of positive-definite $n\times n$ matrices with $n=20$ and fixed condition number $\kappa=3$ .

Example E1.

Consider the diagonal matrix $M_{\textrm{E1}}$ that maximizes the variance of $Y_{k}(M_{\textrm{E1}})$ . (See the proof of Eq. 5.) For even $n$ , this matrix has eigenvalues $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ , where $\lambda_{1},\ldots,\lambda_{n/2}=\tilde{\kappa}$ and $\lambda_{n/2+1},\ldots,\lambda_{n}=1$ .

Example E2.

Consider a diagonal matrix $M_{\textrm{E2}}$ with eigenvalues $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ . We set $\lambda_{1}=\kappa$ and $\lambda_{n}=1$ . We draw $\lambda_{2},\ldots,\lambda_{n-1}$ from a uniform distribution on $[1,\kappa]$ .

Example E3.

We obtain a non-diagonal positive-definite matrix $M_{\textrm{E3}}$ with condition number $\kappa$ via an orthogonal transformation of $M_{\textrm{E1}}$ . That is,

[TABLE]

where $Q$ is an orthogonal matrix that we choose from the Haar measure over the group of orthogonal matrices. We use Stewart’s algorithm [32] to generate $Q$ .

Example E4.

We again generate a random orthogonal matrix $Q$ using Stewart’s algorithm. We obtain another non-diagonal positive-definite matrix $M_{\textrm{E4}}:=Q^{-1}M_{\textrm{E2}}Q$ via an orthogonal transformation of $M_{\textrm{E2}}$ .

In Figure 1, we show the empirical probability densities of $Y_{k}(M)$ for Examples E1, E2, E3, and E4 using four different values of $k$ . For all four examples, we observe that the interval on which $\mu_{M,k}$ is supported shifts to the right for progressively larger $k$ . The length of the supported interval increases with $\wedge_{n,k}$ . For $k=5$ and $k=10$ — the cases in which $\wedge_{n,k}$ is larger than $1$ — the distribution $\mu_{M,k}$ are almost symmetric about $\mathbbm{E}[Y_{k}(M)]$ for all four examples. For Example E1, the distribution $\mu_{M,k}$ is symmetric about its mean for all examined values of $k$ . Its density is nonzero at $\wedge_{n,k}+1$ equidistant points.

In Table 1, we show $\mathbbm{E}[Y_{k}(M)]$ and $\operatorname{var}(Y_{k}(M))$ for the distributions in Figure 1. We first consider the expectation of $Y_{k}(M)$ . For all four examples, we observe that $\mathbbm{E}[Y_{k}(M)]$ increases with $k$ . For all examined values of $k$ , we see that $\mathbbm{E}[Y_{k}(M_{\textrm{E4}})]>\mathbbm{E}[Y_{k}(M_{\textrm{E2}})]>\mathbbm{E}[Y_{k}(M_{\textrm{E3}})]>\mathbbm{E}[Y_{k}(M_{\textrm{E1}})]$ . Our observations thus suggest that the expectation of $Y_{k}(M)$ is large when we choose eigenvalues of $M$ uniformly at random from the interval $[1,\kappa]$ and small when we set half of the eigenvalues of $M$ to $1$ and the other half to $\kappa$ .

We now give several observations about the variance of $Y_{k}(M)$ . For all examined values of $k$ , we see that $\operatorname{var}(Y_{k}(M_{\textrm{E1}}))>\operatorname{var}(Y_{k}(M_{\textrm{E2}}))>\operatorname{var}(Y_{k}(M_{\textrm{E3}}))>\operatorname{var}(Y_{k}(M_{\textrm{E4}}))$ . Our observation of larger $\operatorname{var}(Y_{k}(M))$ for the examples with diagonal matrices (Examples E1 and E2) than for the examples with non-diagonal matrices (Examples E3 and E4) gives intuitive support for Eq. 6. Our observation that $\operatorname{var}(Y_{k}(M_{\textrm{E1}}))>\operatorname{var}(Y_{k}(M_{\textrm{E2}}))$ reflects the fact that Example E1 maximizes the variance in this case (see Eq. 5).

For all examined $k$ , the value of the variance bound in Eq. 3 (see Eq. 3) is at least 12 times larger than the value of the variance bound in Theorem 2 (see Eq. 4). For $k=1$ and $k=19$ , the cases in which $\wedge_{n,k}=1$ , the value of the variance bound in Theorem 2 is equal to the value of the variance bound for diagonal positive-definite matrices (Eq. 5). Additionally, it is sharp in Example E1.

In Fig. 2, we show the empirical tails $\operatorname{Pr}(|Y_{k}(M)-\mathbbm{E}[Y_{k}(M)]|\geq r)$ for our four examples. We also show the tail bound B1 from Eq. 3 and two Chebyshev bounds444We can obtain a tail bound from a variance bound by using Chebyshev’s inequality [33] (page 429), $\operatorname{Pr}(|X-\mathbbm{E}[X]|\geq r)\leq\frac{\operatorname{var}(X)}{r^{2}}$ , for an integrable random variable $X$ and $r\in\mathbbm{R}_{+}$ ., B2 and B3, which we obtain from the variance bounds in Theorems 2 and 5, respectively. Consistent with our observations in Table 1 on $\operatorname{var}(Y_{k}(M))$ , we observe that the tail probability tends to be larger for the examples with diagonal matrices (Examples E1 and E2) than for the examples with non-diagonal matrices (Examples E3 and E4).

The difference in functional form guarantees that the bound B1 intersects with the Chebyshev bound B2 at two values of $r$ . If we denote these values by $r^{\prime}$ and $r^{\prime\prime}>r^{\prime}$ , the bound B1 is sharper than B2 on $[0,r^{\prime}]$ and $[r^{\prime\prime},\infty)$ . In our observations, both bounds exceed the trivial bound $\operatorname{Pr}(|Y_{k}(M)-\mathbbm{E}[Y_{k}(M)]|\geq r)\leq 1$ on $[0,r^{\prime}]$ . The value $r^{\prime\prime}$ lies outside the support of $\operatorname{Pr}(|Y_{k}(M)-\mathbbm{E}[Y_{k}(M)]|\geq r)$ . We thus see that B1 is sharper than B2 only for values of $r$ for which neither bound is informative.

For $k=1$ and $k=19$ , the bounds B2 and B3 coincide and are sharp at $r=(\log\kappa)/2$ when $M=M_{\textrm{E1}}$ . For $k=5$ and $k=10$ , the bound (B3) for diagonal positive-definite matrices is sharper than the bound (B2) for general positive-definite matrices. The difference between the two bounds is most visible for $k=10$ , which is the case that maximizes $\wedge_{n,k}$ .

6 Estimating mean subsystem entropy

We now consider the implications of our results in Section 3 for the problem of estimating the mean subsystem entropy of a given system of coupled variables. When the joint distribution of variables is a multivariate normal distribution, one can compute the differential entropy of a subsystem by applying Eq. 1 to the corresponding sub-covariance matrix. We are interested in the mean subsystem entropy $\mathbbm{E}[h(A_{k}(M))]$ for subsystems of $k$ variables. As we noted previously, the large number of subsystems for even modest values of $n$ and $k$ render it prohibitive to exactly compute $\mathbbm{E}[h(A_{k}(M))]$ . Fortunately, the tail and variance bounds in Section 3 allow us to instead provide sampling guarantees, through which one can achieve a prescribed sampling accuracy. We give upper bounds on the standard error and on the coefficient of variation for both a sample mean of $Y_{k}(M)$ and a sample mean of subsystem entropy.

Fix a subsystem size $k$ and sample size $q\geq 1$ . The $q$ -sample mean of $Y_{k}(M)$ is

[TABLE]

where we choose each $A_{i}$ uniformly at random from $\mathcal{A}_{k}(M)$ . The $q$ -sample mean of subsystem entropy is

[TABLE]

We use $S_{Y}$ and $S_{h}$ as estimators of the population means $\mathbbm{E}[Y_{k}(M)]$ and $\mathbbm{E}[h(A_{k}(M))]$ , respectively. These estimators are unbiased, as $\mathbbm{E}[S_{Y}]=\mathbbm{E}[Y_{k}(M)]$ and $\mathbbm{E}[S_{h}]=\mathbbm{E}[h(A_{k}(M))]$ . A measure of reliability of an estimator is the standard error, which one computes as the estimator’s standard deviation. Because $h(A_{k}(M))$ differs from $Y_{k}(M)/2$ by a constant, the sample mean $S_{h}$ has the standard error

[TABLE]

We may therefore use the bounds of Eqs. 3, 2 and 5 to derive bounds on the standard error for $S_{Y}$ and $S_{h}$ .

Corollary 1 (Standard error of the sample mean subsystem entropy).

Let $M$ be a covariance matrix of an $n$ -variate normal distribution, and suppose that the condition number of $M$ satisfies $\kappa(M)\leq\tilde{\kappa}$ . Let $S_{h}$ be the $q$ -sample mean of the entropy of subsets of $k$ variables; and let $S_{Y}$ be the $q$ -sample mean of log-determinants of $k\times k$ principal submatrices of $M$ . It then follows, for any subsystem size $k$ , that the standard error of the mean subsystem entropy is $\hat{\sigma}(S_{h})=\hat{\sigma}(S_{Y})/2$ and that $\hat{\sigma}(S_{Y})$ satisfies

[TABLE]

and

[TABLE]

Furthermore, if $M$ is diagonal,

[TABLE]

The coefficient of variation $c_{v}(S)$ is another measure of reliability for estimators. It measures the size of the typical error of an estimator $S$ as a fraction of the magnitude of $\mathbbm{E}[S]$ . As a formula, it is given by

[TABLE]

The coefficient of variation for $S_{Y}$ arises from the standard deviation of the relative error

[TABLE]

of $Y_{k}(M)$ because

[TABLE]

For a multivariate Gaussian distribution, the following corollaries give bounds on the coefficient of variation for the sample mean of log-minors and for the sample mean of subsystem entropy.

Corollary 2 (Coefficient of variation for a sample mean of log-minors).

Let $\lambda_{1}(M),\ldots,\lambda_{n}(M)$ be the eigenvalues of $M$ ; we order them from largest to smallest. Let $\ell(M)=\min\{\left|\log\lambda_{1}(M)\right|,\left|\log\lambda_{n}(M)\right|\}$ . If $\ell(M)\neq 0$ , the coefficient of variation for a $q$ -sample mean $S_{Y}$ of $Y_{k}(M)$ satisfies

[TABLE]

and

[TABLE]

Proof.

This corollary follows from Eq. 14. We use Eqs. 3 and 4 as upper bounds on the numerator. For all $k$ , a lower bound on the denominator is $\left|\mathbbm{E}[Y_{k}(M)]\right|\geq k\ell(M)$ . ∎

Corollary 3 (Coefficient of variation for mean subsystem entropy).

For an $n$ -variate Gaussian distribution with covariance matrix $M$ , the coefficient of variation $c_{v}(S_{h})$ for a $q$ -sample mean $S_{h}$ of subsystem entropy satisfies

[TABLE]

and

[TABLE]

Proof.

We derive this result from Eq. 14; we use Eqs. 3 and 4 to bound the numerator, and we use Eq. 1 to bound the expectation in the denominator. ∎

The bounds in Eqs. 16 and 18 are sharper bounds than Eqs. 15 and 17. From Eqs. 16 and 18, we see that both $c_{v}(S_{Y})$ and $c_{v}(S_{h})$ decay in proportion to $\sqrt{k}$ . Indeed, under a certain regularity condition (which we specify in Corollary 4), the coefficient of variation decays to [math] in the limit of large $n$ and large $k$ .

Corollary 4 (Concentration of the relative error).

Let $\{M_{i}\}$ be a sequence of positive-definite matrices of dimension $n(i)$ . Let $k=k(i)\leq n(i)$ be a function of $i$ . Suppose that the sequence

[TABLE]

is nondecreasing and unbounded. It then follows that $c_{v}(S_{Y})\rightarrow 0$ and $\mathcal{E}$ converges in probability to [math] as $i$ becomes large.

*Remark 8**.*

A sufficient condition for the concentration of $\mathcal{E}$ is that the sequence $\{M_{i}\}$ has fixed condition number and the smallest eigenvalue $\lambda_{n}$ is bounded away from both [math] and $1$ . Formally, the latter condition is

[TABLE]

*Remark 9**.*

A popular model for sample covariance matrices is the Wishart ensemble555The Wishart ensemble $W_{n}(V,n_{f})$ with scale matrix $V$ and $n_{f}$ degrees of freedom is the ensemble of random matrices $M:=n_{f}^{-1}\sum_{i=1}^{n_{f}}X_{i}^{T}X_{i}$ , where the $X_{1},X_{2},\dots,X_{n_{f}}$ are $n_{f}$ realizations of an $n$ -variate random variable with 0-mean Gaussian distribution $N_{n}(0,V)$ [34, 35].. A sequence $\{M_{i}\}$ of Wishart matrices can satisfy the condition in Eq. 20 if the ratio $c:=n/n_{f}$ of the number $n$ of variables and the number $n_{f}$ of degrees of freedom is $c\notin\{1/4,1\}$ [35, 36].

One can use these bounds on the standard error to choose a sample size $q$ that guarantees a desired accuracy of a sample mean. In Fig. 3, we show our bounds on the standard error and the coefficient of variation of $S_{Y}$ and $S_{h}$ with $q=2000k$ and $\ell(M)=1$ . In the left panels, we show the bounds B1 from Eq. 11 and B2 from Eq. 12 on the standard error of $S_{Y}$ and $S_{h}$ . In the right panels, we show the bounds B1’ (see Eqs. 15 and 17) and B2’ (see Eqs. 16 and 18) on the coefficient of variation of $S_{Y}$ and $S_{h}$ . In panels (A) and (B), we vary the system size $n$ for fixed subsystem size $k$ . We observe for $n\leq 2k$ that the values of the bounds B2 and B2’ increase with $n$ . For $n>2k$ , the bounds B2 and B2’ are independent of $n$ . The bounds B1 and B1’ are less sharp than the bounds B2 and B2’. The values of B1 and B1’ increase with $n$ and approach their asymptotic values from below. For example, the bound B1 for $\hat{\sigma}(S_{Y})$ has a limiting value of $\sqrt{6k/q}\times\log\tilde{\kappa}\approx 0.06$ . In panels (C) and (D), we vary $k$ for fixed $n$ . We observe for $k\leq n/2$ that the value of the bound B2 on the standard error is independent of $k$ . For $k>n/2$ , the value of B2 decreases with increasing $k$ and is [math] for $k=n$ . The bound B1 is less sharp than B2. Its value decreases with increasing $k$ for any $k\leq n$ . The values of the bounds B1’ and B2’ on the coefficient of variation decrease with increasing subsystem size and are [math] for $k=n$ . In panels (E) and (F), we vary $k$ for fixed ratio $k/n$ , and we observe that the bounds on the standard error are independent of $k$ if the ratio $k/n$ is constant. The values of the bounds B1’ and B2’ decrease with increasing $k$ . This is consistent with our previous observation that $c_{v}(S_{Y})$ vanishes if the sequence $a_{i}$ (see Eq. 19) becomes unbounded.

It is important to note that all of our bounds on the standard error and the coefficient of variation of $S_{Y}$ and $S_{h}$ are asymptotically constant in $n$ . It is thus not necessary to sample proportionally more minors from a larger matrix. Instead, to guarantee a desired accuracy of a sample mean of log-minors or subsystem entropy, one can choose $q$ to be a function of $k$ . To ensure that the standard error is constant or decreases with growing $n$ and $k$ , it is sufficient to choose $q$ in linear proportion to $k$ . When the smallest and largest eigenvalues of a system’s correlation matrix are fixed, one can ensure that the coefficient of variation is constant or decreasing with growing $n$ and $k$ by choosing $q$ in linear proportion to $k^{-1}$ .

7 Conclusions

We examined the problem of estimating the mean subsystem entropy of a system of $n$ coupled variables with covariance matrix $M$ . When the joint distribution of a system’s variables is an $n$ -variate Gaussian, $t$ , or Cauchy distribution, the mean differential entropy of subsystems is an affine function of the log-minors of the covariance matrix [22, 24]. We derived tail and variance bounds on the distribution of log-minors of fixed size of a positive-definite matrix with bounded condition number. Using our variance bounds, we provided upper bounds on the standard error and on the coefficient of variation of both the sample mean of log-minors and the sample mean of subsystem entropy. Our results indicate that, despite the rapid growth of the number of subsystems with $n$ , the accuracy of these sample means is asymptotically independent of a system’s size. Instead, it is sufficient to increase the number of samples in linear proportion to the size of subsystems to achieve a desired sampling accuracy.

Our results are salient to studies that use mean subsystem entropy to examine systems of coupled variables [20, 4, 5]. Even for a system with as few as $50$ variables, sampling just 0.001% of its subsystem entropies can require the computation of over a billion log-determinants. Using the largest and smallest eigenvalues of a system’s covariance matrix to determine the number of samples that are needed to achieve a prescribed accuracy for a sample mean can thus facilitate a quantitative study of mean subsystem entropy when it would otherwise be impossible.

Throughout our paper, we relied only on knowledge of the largest and smallest eigenvalues of a system’s covariance matrix. We expect that it is possible to derive sharper bounds than our current results when one knows the complete spectrum of a system’s covariance matrix, likely by relying on Cauchy’s interlacing theorem (1) to control the log-minors.

We presented two bounds on the variance of a log-minor that we choose uniformly at random from the set of log-minors of size $k$ of an $n\times n$ positive-definite matrix. The variance bound in Theorem 2 is sharper than the one in Eq. 3, but either bound is sufficient to deduce that the accuracy of a sample mean of subsystem entropy is asymptotically independent of a system’s size and that one can achieve a prescribed accuracy by choosing the number of samples in linear proportion to the size of subsystems.

The proof of our first bound (see Section 4.1) relies on the existence of an upper bound for the difference between $\log(\det{A})$ for two different principal submatrices $A\in\mathcal{A}_{k}(M)$ and the invariance of $\log(\det{A})$ under a basis transformation of $A$ . The proof of our second bound (see Section 4.2) relies on the existence of an upper bound and a lower bound for the support of the distribution $\mu_{M,k}$ .

Similar bounds and the invariance under basis transformation hold for several other matrix properties, including the largest and smallest eigenvalues. It is thus plausible that one can derive similar results for the standard error and coefficient of variation for many spectral properties of principal submatrices. For example, Chatterjee and Ledoux (2009) proved a large-deviation inequality for the empirical cumulative eigenvalue distribution of principal submatrices of Hermitian matrices [26]. These and other variance and tail bounds on submatrix properties offer welcoming possibilities to enhance computational studies that characterize complex systems based on the mean properties of their subsystems. For example, they can provide guarantees for linear sketching techniques, which are relevant for data dimensionality reduction. They can also facilitate the use of methods of spectral graph analysis in the study of subgraphs, graphlets, and motifs in networks.

Acknowledgements

We thank Clément Canonne, Kameron Decker Harris, Michael Neely, and participants of the IPAM Quantitative Linear Algebra Tutorials for helpful discussions. A.C.S. was supported by the Clarendon Fund, e-Therapeutics plc, and funding from the Engineering and Physical Sciences Research Council under grant number EP/L016044/1. P.S.C. was supported by the National Science Foundation under Graduate Research Fellowship Grant 1122374.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Skiena [2017] S. S. Skiena. The Data Science Design Manual . Springer, Cham, Switzerland, 2017.
2Newman [2018] M. E. J. Newman. Networks . Oxford University Press, Oxford, United Kingdom, 2018.
3Strogatz [2018] S. H. Strogatz. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering . Westview Press, Boulder, CO, USA, 2018.
4Tononi et al. [1994] G. Tononi, O. Sporns, and G. M. Edelman. A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proceedings of the National Academy of Sciences of the United States of America , 91(11):5033–5037, 1994.
5Tononi et al. [1999] G. Tononi, O. Sporns, and G. M. Edelman. Measures of degeneracy and redundancy in biological networks. Proceedings of the National Academy of Sciences of the United States of America , 96(6):3257–3262, 1999.
6De Lucia et al. [2005] M. De Lucia, M. Bottaccio, M. Montuori, and L. Pietronero. Topological approach to neural complexity. Physical Review E , 71(1):016114, 2005.
7Randles et al. [2011] M. Randles, D. Lamb, E. Odat, and A. Taleb-Bendiab. Distributed redundancy and robustness in complex systems. Journal of Computer and System Sciences , 77(2):293–304, 2011.
8Li et al. [2012] Y. Li, G. Dwivedi, W. Huang, M. L. Kemp, and Y. Yi. Quantification of degeneracy in biological systems for characterization of functional interactions between modules. Journal of Theoretical biology , 302:29–38, 2012.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Log-minor distributions and an application to estimating mean subsystem entropy

Abstract

1 Introduction

2 Notation

3 Bounds on the distribution of log-minors

Theorem 1** (Tail and variance bound for log-minors of a positive-definite matrix).**

Remark 1*.*

Theorem 2** (Support and variance bound for log-minors of a positive-definite matrix).**

Remark 2*.*

Remark 3*.*

Theorem 3** (Variance bound for log-minors of a positive-definite diagonal matrix).**

Remark 4*.*

Remark 5*.*

Remark 6*.*

Conjecture 1** (Diagonal matrices maximize log-minor variance).**

4 Proofs of bounds on the distribution of log-minors

4.1 Proof of Eq. 3

Proposition 1** (Cauchy’s interlacing theorem [27]).**

Proposition 2** (Large-deviation inequality for functions on countable sets [25] (page 50)).**

Remark 7*.*

Proposition 3** (Spectral gap of random-transposition walk [28]).**

4.2 Proof of Theorem 2

Proposition 4** (Popoviciu’s inequality [29, 30]).**

4.3 Proof of Eq. 5

5 Examples

Example E1.

Example E2.

Example E3.

Example E4.

6 Estimating mean subsystem entropy

Corollary 1** (Standard error of the sample mean subsystem entropy).**

Corollary 2** (Coefficient of variation for a sample mean of log-minors).**

Corollary 3** (Coefficient of variation for mean subsystem entropy).**

Corollary 4** (Concentration of the relative error).**

Remark 8*.*

Remark 9*.*

7 Conclusions

Acknowledgements

Theorem 1 (Tail and variance bound for log-minors of a positive-definite matrix).

*Remark 1**.*

Theorem 2 (Support and variance bound for log-minors of a positive-definite matrix).

*Remark 2**.*

*Remark 3**.*

Theorem 3 (Variance bound for log-minors of a positive-definite diagonal matrix).

*Remark 4**.*

*Remark 5**.*

*Remark 6**.*

Conjecture 1 (Diagonal matrices maximize log-minor variance).

Proposition 1 (Cauchy’s interlacing theorem [27]).

Proposition 2 (Large-deviation inequality for functions on countable sets [25] (page 50)).

*Remark 7**.*

Proposition 3 (Spectral gap of random-transposition walk [28]).

Proposition 4 (Popoviciu’s inequality [29, 30]).

Corollary 1 (Standard error of the sample mean subsystem entropy).

Corollary 2 (Coefficient of variation for a sample mean of log-minors).

Corollary 3 (Coefficient of variation for mean subsystem entropy).

Corollary 4 (Concentration of the relative error).

*Remark 8**.*

*Remark 9**.*