Identifiability of parametric random matrix models
Tomohiro Hayase

TL;DR
This paper studies whether the parameters of certain random matrix models can be uniquely determined from their spectral distributions, using free probability theory to establish conditions for identifiability.
Contribution
It demonstrates that compound Wishart and signal-plus-noise models are identifiable up to rotation, advancing understanding of parameter recovery in spectral analysis.
Findings
Models are identifiable up to rotation.
Free probability theory is effective for analyzing identifiability.
Provides theoretical conditions for parameter uniqueness.
Abstract
We investigate parameter identifiability of spectral distributions of random matrices. In particular, we treat compound Wishart type and signal-plus-noise type. We show that each model is identifiable up to some kind of rotation of parameter space. Our method is based on free probability theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Bayesian Methods and Mixture Models · Advanced Combinatorial Mathematics
Identifiability of Parametric Random Matrix Models
Tomohiro Hayase
Graduate School of Mathematical Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo, 153-8914, Japan
Abstract.
We investigate parameter identifiability of spectral distributions of random matrices. In particular, we treat compound Wishart type and signal-plus-noise type. We show that each model is identifiable up to some kind of rotation of parameter space. Our method is based on free probability theory.
Key words and phrases:
Identifiability, Random Matrix Theory, Free Probability Theory, Statistical Models
Contents
1. Introduction
Identifiability analysis is fundamental in a theoretical understanding of statistical models, for example, log-likelihood maximization. A parametric statistical model , a parametric family of probability measures, is said to be identifiable if the map is injective. For a statistical model, its identifiability is necessary for its regularity. Under regularity condition, then maximal likelihood estimator has a good behavior such as asymptotic normality. In general, a geometry of log-likelihood is determined by the Fisher information matrix (see [2]), which is expected Hessian of log-likelihood with respect to parameters. If a statistical model is non-identifiable, then the Fisher information matrix is singular, and the eigenspace for the zero eigenvalue is determined by non-identifiable parameters. Therefore, determining non-identifiable parameters is important in non-identifiable models.
In this paper, we investigate identifiability of statistical models introduced for parameter estimation of random matrix models. In [8], two typical random matrix models, the compound Wishart model and the signal-plus-noise model are treated. They are defined as the following:
[TABLE]
where is matrix of independent and identically distributed Gaussian random variables with mean zero and variance , and are deterministic matrices, and . For any self-adjoint matrix , let us denote by the eigenvalue distribution defined as
[TABLE]
where are eigenvalues of . The parameter estimation method introduced in [8] is minimizing modified KL-divergence between a statistical model
[TABLE]
and a sample of the empirical eigenvalue distribution (resp. ), where true parameters , are unknown. The definition of the statistical models is based on free deterministic equivalent. The free deterministic equivalent is introduced by [14], which is a deterministic and infinite-dimensional approximation of random matrices based on a central limit theorem of the eigenvalue distribution.
It directly follows from the definition of that
[TABLE]
In particular, these statistical models are not identifiable. For the CW model, it is easy to show that the converse also holds:
[TABLE]
In other words, if we replace the parameter set by the set of eigenvalue distributions then this model becomes identifiable. Note that there is a bijection between the set of eigenvalue distributions and
[TABLE]
where is the permutation group of elements. However, it is not clear that the converse holds for the SPN model.
The main theorem of this paper is as follows.
Theorem 1.1**.**
Let with . For and , the following holds:
[TABLE]
In particular, if we replace the parameter space by the direct product of singular value distribution and the nonzero real numbers, then this statistical model becomes identifiable. Note that there is a bijective between the direct product and
[TABLE]
Our proof consists of an analytic part based on operator-valued analytic free additive subordination [3] and a combinatorial part based on free multiplicative deconvolution [11, 12].
2. Related Works
The compound Wishart random matrix was introduced by [13]. It appears as sample covariance matrices of correlated samplings [4, 5, 7]. The signal-plus-noise random matrix appears in signal precessing [11, 6, 15].
Free probability is invented by Voiculescu [16]. In free probability theory, motivated by solving a problem in operator algebras, some infinite-dimensional operators are described as infinite-dimensional limit of random matrices. The approximation is based on a central limit theorem, which is called the free central limit theorem, of eigenvalue distribution of random matrices [17]. Conversely, the purpose of free deterministic equivalent is to approximate fixed-size but large random matrix models by deterministic operators.
For analysis of non-identifiable models, generic identifiability was introduced in [1].
3. Preliminary
3.1. Freeness
First, we summarize some definitions from operator algebras and free probability theory. See [9] for the detail.
Definition 3.1**.**
- (1)
A C∗-probability space is a pair satisfying followings.
- (a)
The set is a unital -algebra, that is, a possibly non-commutative subalgebra of the algebra of bounded -linear operators on a Hilbert space over satisfying the following conditions:
- (i)
it is stable under the adjoint , 2. (ii)
it is closed under the topology of the operator norm of , 3. (iii)
it contains the identity operator as the unit of . 2. (b)
The function on is a faithful tracial state, that, is a -valued linear functional with
- (i)
for any , and the equality holds if and only if , 2. (ii)
, 3. (iii)
for any . 2. (2)
A subalgebra of a C∗-algebra is called a -subalgebra if it is stable under the adjoint operator . Moreover, it is called a unital C∗-subalgebra if the -subalgebra is closed under the operator norm topology and contains as its unit. 3. (3)
Two unital -algebras are called -isomorphic if there is a bijective linear map between them which preserves the -operation and the multiplication. 4. (4)
Let us denote by the set of self-adjoint elements, that is, of . 5. (5)
Write and for any . 6. (6)
The distribution of is the probability measure determined by
[TABLE] 7. (7)
For , we define its Cauchy transform by , equivalently, .
Definition 3.2**.**
A family of -subalgebras of is said to be free if the following factorization rule holds: for any and indexes with , and with , it holds that
[TABLE]
Let be a family of self-adjoint elements . For , let be the -subalgebra of polynomials of . Then is said to be free if is free.
We introduce special elements in a non-commutative probability space.
Definition 3.3**.**
Let be a C∗-probability space.
- (1)
An element is called standard semicircular if its distribution is given by the standard semicircular law;
[TABLE]
where is the indicator function for any subset . 2. (2)
Let . An element is called circular of variance if
[TABLE]
where is a pair of free standard semicircular elements. In addition. is called standard circular element if . 3. (3)
A -free circular family (resp. standard -free circular family) is a family of circular elements such that is free (resp. and each elements is of variance ).
Definition 3.4**.**
Let be a C∗-probability space and be a unital C∗-subalgebra of . Recall that they share the unit: .
- (1)
Then a linear operator is called a conditional expectation onto if it satisfies following conditions;
- (a)
for any , 2. (b)
for any and , 3. (c)
for any . 2. (2)
We write \mathbb{H}^{+}(\mathfrak{B}):=\{W\in\mathfrak{B}\mid\text{ there is \varepsilon>0\imaginary W\geq\varepsilon I_{\mathfrak{A}}}\} and . 3. (3)
Let be a conditional expectation. For , we define a -Cauchy transform as the map , where
[TABLE]
If there is no confusion, we also call a -valued Cauchy transform.
Definition 3.5**.**
(Operator-valued Freeness) Let be a C∗-probability space, and be a conditional expectation. Let be a family of -subalgebras of such that . Then is said to be -free if the following factorization rule holds: for any and indexes with , and with , it holds that
[TABLE]
In addition, a family of elements is called -free if the family of -subalgebra of the -coefficient polynomials of is -free.
3.2. Random Matrix Models and Free Deterministic Equivalents
Definition 3.6**.**
Fix a probability measure space . Write . Let . Then real (resp. complex) Ginibre random matrix of variance is defined as matrix of independent and identically distributed real (resp. complex) Gaussian random variables such that
[TABLE]
Definition 3.7**.**
Let (resp. ). Let us denote by the real (resp. complex) Ginibre random matrix of variance .
- (1)
A real (resp. complex) compound Wishart model ( CW model for short) of type is defined as a parametric family where
[TABLE] 2. (2)
A real (resp. complex) signal-plus-noise model (SPN model for short ) of type is defined as a parametric family , where
[TABLE]
Here we introduce free deterministic equivalent of each random matrix model. Note that the free deterministic equivalent does not depend on the choice of the field or .
Definition 3.8**.**
Let . Fix a C∗-probability space . Let us denote by the matrix of -free circular elements in so that
[TABLE]
- (1)
The free deterministic equivalent of CW model (FDECW model, for short) of type is defined as a parametric family , where
[TABLE]
In addition, we denote by the distribution of in the C∗-probability space :
[TABLE] 2. (2)
The free deterministic equivalent of SPN model (FDESPN model, for short) of type is defined as a parametric family , where
[TABLE]
In addition we denote by the distribution of in the C∗-probability space , that is,
[TABLE]
4. Identifiability
4.1. Identifiability of CW Model
First, we quickly check the identifiability of the CW model. Fix . Let and be the vectors of eigenvalues of respectively. Assume that
[TABLE]
Now since is a compound free Poisson law ( see [10]), the -transform of is given by the following.
[TABLE]
By the assumption (4.1), it holds that
[TABLE]
Since all polos of are order one, and are equal up to permutation of entries, that is, there is a permutation such that
[TABLE]
Equivalently, we have
[TABLE]
4.2. Identifiablity of SPN Model
Next, we work on the SPN model. We prove the following identifiability of the statistical model for the random matrix model . The proof is divided into an analytic part and a combinatorial one.
Theorem 4.1**.**
Let with , , and . Then if and only if and .
The proof is postponed to Section 4.2.5.
4.2.1. Analytic Part
Write
[TABLE]
We identify and via the following isomorphism :
[TABLE]
We define a conditional expectation by
[TABLE]
where is the -upper left corner of and is the -lower right corner of . For and , we write
[TABLE]
For any rectangular matrix , write
[TABLE]
Let . Then we have
[TABLE]
Applying , we have
[TABLE]
In particular, is determined by . Let be a matrix of -free standard circular elements. By [8, Proposition 5.30], is a -valued semicircular element (see [9, Section 9.1] for the definition) with the following variance mapping :
[TABLE]
Hence the following equations hold for any :
[TABLE]
Next, to prove a key lemma, we refer to an analytic free additive subordination formula based on [3].
Corollary 4.2**.**
Set and . Then there exists a pair of Fréche analytic (equivalently, holomorphic) mappings so that for all ,
[TABLE]
Proof.
By [8, Proposition 5.30], the pair is -free. Then the assertion follows from [3, Theorem 2.7]. ∎
Lemma 4.3**.**
Let with . Let and . Then we have the following equation between holomorphic mappings on :
[TABLE]
Proof.
Set and . Pick same holomorphic mappings and as in Corollary 4.2. Then for any ,
[TABLE]
∎
Now we have prepared to prove the first key lemma.
Lemma 4.4**.**
Fix with . Let and . If then .
Proof.
Assume that . Then since is determined by for any .
In the case , it holds that . Thus and . Since has no atom and is a sum of delta measures, we have .
Consider the case . Write . Now for any , by the assumption and Lemma 4.3, the following holds:
[TABLE]
Let
[TABLE]
Then
[TABLE]
where is the multiplicity of the eigenvalue of . Let be eigenvalues of . Then
[TABLE]
Now for any and ,
[TABLE]
Let and . Then (4.32) converges to [math] as by (4.30).
Assume that , then by , it holds that
[TABLE]
In particular,
[TABLE]
By (4.27), this contradicts . Therefore . ∎
4.2.2. Combinatorial Part
We use the free multiplicative deconvolution introduced by [12, 11]. We quickly review the deconvolution.
First, we introduce a family of formal power series, since the deconvolution is defined as an operation between moment power series. Let us denote by the set of formal power series without the constant term of the form
[TABLE]
with . Let be as in (4.35). For every we denote
[TABLE]
Second, we introduce Kreweras complement and boxed convolution. Here we only need one-dimensional boxed convolution. See [10, Lecture 17, 18] for the detail. Let and . Write and consider the discriminant union . We write the elements from the second entry as , and write . We define an order as follows:
[TABLE]
Then the set is a totally ordered set. Let and
[TABLE]
Then has the biggest element with respect to the following partially order of : for and , if such that . The Kreweras complement of , denoted by is defined as
[TABLE]
For and , we denote
[TABLE]
where is the number of elements in . For , the one dimensional boxed convolution (boxed convolution, for short), denoted by f\framebox[7.0pt]{\star}g is defined as
[TABLE]
where is the Kreweras complement (4.39). One has the operation \framebox[7.0pt]{\star} is associative and commutative [10, Proposition 17.5, Corollary 17.10]. In addition, let us denote by the series in defined as
[TABLE]
Then is the unit of (\Xi,\framebox[7.0pt]{\star}) [10, Proposition 17.5]. We denote by the set of invertible elements in with respect to \framebox[7.0pt]{\star}. For , we denote by its inverse with respect to \framebox[7.0pt]{\star}. Then by [10, Proposition 17.7],
[TABLE]
Third, we define the Zeta function as
[TABLE]
Clearly . Then we define the R-transform of formal power series.
Definition 4.5**.**
(R-transform) Let . Let us define the R-transform of as
[TABLE]
For any probability measure on with all moments finite, we denote by its moment formal power series:
[TABLE]
Let be a C∗-probability space, and let be an element of . The moment power series of , denote by , is a formal power series defined as
[TABLE]
We simply write
[TABLE]
Usually R-transform of is defined as formal power series whose coefficients are free cumulants (see [10]). The compatibility of our definition (4.49) and usual definition is proven in [10, Proposition 17.4]. In addition, the following holds.
Lemma 4.6**.**
Let be a C∗-probability space and . Assume that is free. Then
[TABLE]
Proof.
This is a direct consequence of [10, Proposition 17.2]. ∎
Lastly, note that it holds that for ,
[TABLE]
since . Now we have prepared to define the free multiplicative deconvolution.
Definition 4.7**.**
(free multiplicative deconvolution) For and , the free multiplicative deconvolution of with is defined as
[TABLE]
Equivalently, f\ gΞdeterminedby\begin{aligned} R_{f}=R_{g}\framebox[7.0pt]{\star}R_{(f\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;g)}.\end{aligned}
Example 4.8**.**
Let and be the delta measure on whose support is . Then
[TABLE]
since
[TABLE]
Note that . Hence
[TABLE]
Then for any , we have
[TABLE]
In particular, if , it holds that
[TABLE]
In the case with , it is easy to show that
[TABLE]
since each scalar is free from any element of .
Definition 4.9**.**
Let . Then their free additive convolution, denoted by , is defined as
[TABLE]
Equivalently, is the unique formal power series in determined by
[TABLE]
Notation 4.10**.**
Let be a C∗-probability space. Let be a non-zero projection, that is, . Then
[TABLE]
becomes a C∗-probability space. For , we denote by the moment power series of in :
[TABLE]
Proposition 4.11**.**
Let be a C∗-probability space. Assume that satisfies the following conditions:
- (1)
, 2. (2)
* is a circular element, that is,*
[TABLE]
where is a pair of free standard semicircular elements in and , 3. (3)
* is a projection, and,* 4. (4)
* is a pair of free families.*
Set and
[TABLE]
Then we have
[TABLE]
Proof.
This is a direct consequence of [12, Theorem 3.4]. ∎
4.2.3. Free Poisson Distribution
The formal power series in Proposition 4.11 is R-transform of a free Poisson distribution. We review on the free Poisson distribution.
Definition 4.12**.**
(Free Poisson Distribution) Let , . Then the free Poisson distribution with rate and jump size is defined as the probability measure on determined by
[TABLE]
Usually free Poisson law is defined as the limit law of free version of law of small numbers **[10, Definition 12.12]**. The compatibility between our definition and usual definition is given by **[10, Proposition 12.11]**. Note that is, in fact, a compactly supported probability measure. Note that
[TABLE]
Lemma 4.13**.**
Let be a C∗-probability space, , and be a non-zero projection free from . Then it holds that
[TABLE]
where .
This is well-known, but for the reader’s convenience, we sketch the proof.
Proof.
Note that By the tracial condition and Lemma 4.6,
[TABLE]
By definition of the boxed convolution, we have
[TABLE]
Since , this is equal to
[TABLE]
Thus
[TABLE]
∎
Example 4.14**.**
Let and be a nonzero-projection. Assume that is free pair in and is a standard circular element. Then by Lemma 4.13,
[TABLE]
4.2.4. Second Lemma
In this section, we convert the model to an operator of the form where is a projection. Let be a C∗-probability space. Let with and write . In this section and in next one, we denote by be a matrix of -free circular elements with
[TABLE]
Recall that
[TABLE]
Now we identify with upper-left corner of with a normalization as the following:
[TABLE]
Recall that a family \{C^{p,p}_{ij}\mid\text{ 1\leq i,j\leq p }\} is a -free family of circular elements such as
[TABLE]
We write
[TABLE]
Then is a circular element in , and it is standard, that is,
[TABLE]
We define a projection as
[TABLE]
One has . For a -matrix , let us denote by be the -square matrix obtained by adding zeros to ;
[TABLE]
Now by definition, we have
[TABLE]
Therefore, for any ,
[TABLE]
Equivalently, we have
[TABLE]
Recall that
[TABLE]
Lemma 4.15**.**
Let . Then
[TABLE]
where is the delta measure on whose support is .
Proof.
Now and is -free in , since the entries of and are scalar. By Lemma 4.13,
[TABLE]
Hence by (4.57),
[TABLE]
∎
Corollary 4.16**.**
Let , , . Assume that and set . Then
[TABLE]
Proof.
By (4.86) and Proposition 4.65, the left-hand side is equal to
[TABLE]
Now
[TABLE]
By Lemma 4.15, it holds that
[TABLE]
Hence the assertion holds. ∎
Lemma 4.17**.**
Assume that , and satisfy
[TABLE]
Then
[TABLE]
Proof.
Apply Zeta Zetatobothhandside,wehave\eqref{align:subtract-scalar}.\qed\end@proof\par\par\par\par Nowweprovethesecondkeylemma.\begin{lem}Letp,d\in\mathbb{N}\sigma,\rho\in\mathbb{R}AB\in M_{p,d}(\mathbb{C})\sigma^{2}\geq\rho^{2}and\begin{aligned} {\mu_{\mathrm{SPN}}^{\Box}(A,\sigma)}={\mu_{\mathrm{SPN}}^{\Box}(B,\rho)}.\end{aligned}Then\begin{aligned} {\mu_{\mathrm{SPN}}^{\Box}(A,\sqrt{\sigma^{2}-\rho^{2}})}={\mu_{\mathrm{SPN}}^{\Box}(B,0)}.\end{aligned}\par\end{lem}\@proof ByCorollary~{}\ref{cor:deconv-spn}andtheassumption,wehave\begin{aligned} (M_{A^{*}A}\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;f_{\lambda})\boxplus M[\delta_{\sigma^{2}/\lambda}]=(M_{B^{*}B}\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;f_{\lambda})\boxplus M[\delta_{\rho^{2}/\lambda}].\end{aligned}ThusbyLemma~{}\ref{lem:moment_delta},itholdsthat\begin{aligned} (M[A^{*}A]\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;f_{\lambda})\boxplus M[\delta_{(\sigma^{2}-\rho^{2})/\lambda}]=M[B^{*}B]\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;f_{\lambda}.\end{aligned}ByusingCorollary~{}\ref{cor:deconv-spn}again,wehave\begin{aligned} M[{\mu_{\mathrm{SPN}}^{\Box}(A,\sqrt{\sigma^{2}-\rho^{2}})}]\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;f_{\lambda}=M[{\mu_{\mathrm{SPN}}^{\Box}(B,0)}]\ \framebox(0.0,0.0)[bl]{\smallsetminus}\;f_{\lambda}.\end{aligned}Equivalently,\begin{aligned} R[{\mu_{\mathrm{SPN}}^{\Box}(A,\sqrt{\sigma^{2}-\rho^{2}})}]\framebox[7.0pt]{\star}R[f_{\lambda}]^{-1}=R[{\mu_{\mathrm{SPN}}^{\Box}(B,0)}]\framebox[7.0pt]{\star}R[f_{\lambda}]^{-1}.\end{aligned}Applying\framebox[7.0pt]{}R[f_{\lambda}]\framebox[7.0pt]{}\mathrm{Zeta}
4.2.5. Proof of Identifiability
proof of Theorem 4.1.
Without loss of generality, we may assume that . Let . First, by Lemma 4.18, we have
[TABLE]
Second, Lemma 4.4 implies . Then , which completes the proof. ∎
5. Acknowledgement
We would like to thank Hiroaki Yoshida for discussions. We appreciate Yuichi Ike’s valuable comments on our manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. S. Allman, C. Matias, J. A. Rhodes, et al. Identifiability of parameters in latent structure models with many observed variables. Ann. Stat. , 37(6A):3099–3132, 2009.
- 2[2] S. Amari. Information geometry and its applications . Springer, 2016.
- 3[3] S. T. Belinschi, T. Mai, and R. Speicher. Analytic subordination theory of operator-valued free additive convolution and the solution of a general random matrix problem. J. Reine Angew. Math. , 2013.
- 4[4] Z. Burda, A. Jarosz, M. A. Nowak, J. Jurkiewicz, G. Papp, and I. Zahed. Applying free random variables to random matrix analysis of financial data. part I: The gaussian case. Quant. Fin. , 11(7):1103–1124, 2011.
- 5[5] R. Couillet, M. Debbah, and J. W. Silverstein. A deterministic equivalent for the analysis of correlated mimo multiple access channels. IEEE Trans. on Inform. Theory , 57(6):3493–3514, 2011.
- 6[6] W. Hachem, P. Loubaton, X. Mestre, J. Najim, and P. Vallet. Large information plus noise random matrix models and consistent subspace estimation in large sensor networks. Random Matrices Theory Appl. , 1(02):1150006, 2012.
- 7[7] A. Hasegawa, N. Sakuma, and H. Yoshida. Random matrices by MA models and compound free poisson laws. Probab. Math. Statist , 33(2):243–254, 2013.
- 8[8] T. Hayase. Cauchy noise loss for stochastic optimization of random matrix models via free deterministic equivalents. ar Xiv:1804.03154 [stat.ML] , 2018.
