TL;DR
This paper introduces a method for estimating low-rank covariance matrices of signals observed through unknown cyclic shifts and noise, using shift-invariant moments like the power spectrum and trispectrum, with proven consistency and practical effectiveness.
Contribution
It develops a polynomial-time, statistically consistent procedure for low-rank covariance estimation from translated noisy observations using shift-invariant moments, extending PCA applicability.
Findings
Covariance can be recovered from power spectrum and trispectrum when rank is low.
The method is statistically consistent and effective even under high noise levels.
Full-rank covariance matrices cannot be reliably estimated with this approach.
Abstract
We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length and follows a standard factor model with mean zero and normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the and the . We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when . Correspondingly, we provide a polynomial-timeâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Multi-Reference Factor Analysis: low-rank covariance estimation under unknown translations
Boris Landa and Yoel Shkolnisky
ââ
Boris Landa1,â    Yoel Shkolnisky2
1Program in Applied Mathematics, Yale University
2Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University
âCorresponding author. Email: [email protected]
Abstract
We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length and follows a standard factor model with mean zero and normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the power spectrum and the trispectrum. We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when . Correspondingly, we provide a polynomial-time procedure for estimating the covariance matrix from many (translated and noisy) observations, where no explicit knowledge of is required, and prove the procedureâs statistical consistency. While our results establish that covariance estimation is possible from the power spectrum and the trispectrum for low-rank covariance matrices, we prove that this is not the case for full-rank covariance matrices. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate the favorable performance of our algorithms in various settings, including in high levels of noise.
1 Introduction
Principal Components Analysis (PCA) is a ubiquitous technique in science and engineering, which is used extensively for processing and analyzing large datasets. A standard approach for PCA is to estimate the covariance matrix of the dataset, compute its eigen-decomposition, and then project the data points onto the first several leading eigenvectors (i.e. with the largest corresponding eigenvalues). In various scientific applications, PCA is applied to collections of one-dimensional signals, where the underlying assumption is that these signals are low-rank, in the sense that they reside on (or near) a low-dimensional linear subspace. However, it is often the case that real-world signal measurements are prone to certain group-action deformations, where a common example is that of translations. When different translations are applied to low-rank signals, the resulting covariance matrix loses its low-rank structure (a claim which is made more precise shortly), thus rendering PCA ineffective without first aligning the signals. This scenario, where signals are acquired through unknown translations, is encountered for example in radar target classification [24, 45, 46], chromotographic fingerprinting [14, 30, 36], machine fault diagnosis [17, 28], and ECG signal classification [21, 22]. In these applications, a typical scenario includes collecting a large dataset of signals for analysis and classification, followed by PCA for denoising and dimensionality reduction. For PCA to be effective, the datasetâs covariance matrix should be approximately low-rank. Hence, to account for the different translations it is customary to first align the signals in the dataset. Numerous methods exist for the task of signal alignment, where standard approaches include pair-wise registration, and matching with a predefined template. Yet, it is important to stress that if the signals admit significant heterogeneity (i.e. inherent variability not associated with noise) then alignment is not well-defined, as the concept of aligning two very different patterns is meaningless. Moreover, signal alignment â even between identical copies â cannot be achieved in high levels of noise [9, 41]. Motivated by the above-mentioned limitations of signal alignment, we consider the problem of accurately estimating the covariance matrix of low-rank signals from their translated and noisy observations.
1.1 The setting
We consider the following model for an observed signal :
[TABLE]
where is the underlying signal (to be described shortly), is a noise vector with either or ( stands for the circularly-symmetric complex normal distribution, is an identity matrix), and is a discrete cyclic shift by , i.e.
[TABLE]
with drawn from some unknown probability distribution over . In what follows, we consider all vectors as cyclic, and drop the modulus by from our notation in all index assignments. The underlying signal is assumed to follow a standard zero-mean factor model
[TABLE]
where are orthonormal, and are i.i.d with either (if ) or (if ). We mention that while the theoretical analysis in this work focuses on the complex-valued case (where and ), for practical purposes we also consider the real-valued case (, , and ), providing appropriate modifications to our algorithms (see Section 3.1). Now, given (3), the covariance matrix of is
[TABLE]
where stands for complex-conjugate and transpose. While the rank of is and can be considerably smaller than , the covariance matrix of , given by , typically admits a rank much larger than , even if no noise is present (). This is because the rank of is dominated by the dimension of the set of vectors for and (where is a set of allowed shifts), which can exceed significantly. In particular, if the probability distribution of is non-vanishing (i.e. all shifts are allowed; ) and is aperiodic for some (see [1]), then is full-rank.
Considering the setting of (1)â(3), a fundamental problem of interest is the following one.
Problem 1** (Multi-Reference Factor Analysis).**
Given i.i.d measurements following the model (1)â(3), estimate and .
Problem 1, termed Multi-Reference Factor Analysis (MRFA), can be viewed as a generalization of standard factor analysis, to the setting of unknown translations of the underlying signal vectors. In this work, instead of estimating and directly, we consider the closely-related problem of estimating the covariance matrix , whose eigenvalues and eigenvectors are and , respectively. We exemplify our setting in Figure 1 for , , and .
1.2 Related work
As far as we know, the MRFA problem (Problem 1) as presented here has not been treated in the literature. However, it is worthwhile to review some closely related problems, and the approaches undertaken to solve them. When the signal in (1) is deterministic and fixed, our setting becomes that of Multi-Reference Alignment (MRA) [2, 8, 9, 32, 12, 6], where an unknown vector is to be recovered from its translated and noisy observations. Since is fixed in MRA, its translated copies can be accurately aligned when the Signal-to-Noise Ratio (SNR) is sufficiently high. Therefore, the line of work in MRA mostly focuses on the low-SNR regime, where alignment is impossible. Signal estimation under group actions other than translations has recently been considered in [7, 3].
In the majority of the above-mentioned works on MRA, signal estimation is carried out through an instance of the Method of Moments [37] (see also the generalized method of moments [19]). Specifically, certain shift-invariant moments of the underlying signal are identified and estimated, and then employed for recovering the underlying signal from the emerging moment equations. Shift-invariant moments which are extensively used in this context are the signalâs mean, the signalâs power spectrum, and the signalâs bispectrum [13]. The power spectrum and the bispectrum of a signal essentially correspond to certain double and triple correlations, respectively, in the Fourier domain of the signal, which are invariant to cyclic shifts of the signal. It is important to point out that the power spectrum and the bispectrum are omnipresent in classical statistical signal processing, see for example [31], with an endless list of applications. In the context of MRA, knowing the bispectrum is sufficient to recover the signal up to fundamental ambiguities (arbitrary cyclic shifts of the underlying signal) [9]. The algorithmic use of the bispectrum (often coupled with other shift-invariant moments for improved performance) for estimating the signal in MRA was investigated in [9, 32], where different algorithms were presented for this task. Particularly, much focus was put on the required sample complexity, namely the number of samples required to achieve a prescribed estimation error for a given SNR value. We also mention [1], where it was shown that the second moment of the measurements in MRA is sufficient to recover the signal if the distribution of the shifts is aperiodic, resulting in an improved sample complexity rate. Aside from approaches leveraging shift-invariant moments, another widespread approach for solving estimation problems akin to MRA (or more generally - estimation problems involving nuisance parameters) is Expectation Maximization (EM) [16]. While EM is popular and intuitive, its crucial drawback is lack of theoretical convergence guarantees. On the other hand, methods based on shift-invariant moments are amenable to rigorous analysis, allowing for algorithms with provable theoretical guarantees. Furthermore, methods based on invariant moments lead to single-pass algorithms, in the sense that every measurement is only considered once, with subsequent processing involving only the estimated moments.
We also mention that several recent works extend the standard model of MRA (where is fixed) to a scenario where assumes discrete heterogeneity [10, 29, 32], namely that is chosen from a finite set of templates .
Another closely related line of work is that of Generalized Principal Components Analysis (GPCA) (also known as subspace clustering)Â [39, 38, 27], where data points are assumed to reside on a union of low-dimensional linear subspaces. The MRFA problem could be considered as a special case of GPCA, where each subspace is described by a different translation of the subspace spanned by . However, in GPCA no relation between the different subspaces is assumed, and consequently, the theoretical recovery guarantees for subspace recovery are typically prohibitive if the noise variance and the number of subspaces are large.
Last, we mention [5], where the authors considered the problem of MRFA in the restricted case of . A key observation in [5] is that if the distribution of the shifts is uniform, then the bispectrum vanishes entirely. Therefore, it was proposed to employ the fourth-order shift-invariant moment known as the trispectrum [13] to estimate the signal parameters ( and ), and algorithms were presented for consistently estimating and as . The trispectrum is analogous to the power spectrum and the bispectrum, in the sense that it consists of certain quadruple correlations in the Fourier domain of the signal, which are invariant to cyclic shifts. As in MRA, much focus was put in [5] on the sample complexity of the algorithms in the low-SNR regime, since in the high-SNR regime the observations can be accurately aligned using correlations (a fact which is only true for the case ).
1.3 Our contributions and main results
We consider the problem of estimating the covariance matrix from the measurements (generated according to the model (1)â(3)) using shift-invariant moments, where we assume no explicit knowledge of the rank . We investigate certain theoretical aspects of uniqueness and identifiability, derive practical algorithms for consistently estimating from , and conclude with extensive simulations.
We next describe our contributions in detail. Unless otherwise stated, we refer the complex-valued case of our setting, where and .
1.3.1 Characterization of uniqueness and identifiability
We begin by investigating the moments equations arising from the power spectrum and the trispectrum, and consider the question of whether these equations are sufficient to recover , and under which conditions. The results of this investigation are reported in Section 2, where we show the following. By posing an equivalent formulation to our problem in the Fourier domain, replacing with its analogue in the Fourier domain , we show that the algebraic structure of the moments equations (arising from the power spectrum and the trispectrum) determines completely up to multiplication (element-wise) with an unknown circulant matrix of phases (namely a circulant matrix with unit-magnitude elements). Essentially, this multiplication with a circulant matrix of phases corresponds to phase uncertainties on the diagonals of . We then consider the problem of resolving these uncertainties, a problem which we term circulant phase retrieval (see Problem 2). We show that by leveraging the Hermitian and positive semidefinite (PSD) structure of , it is possible to solve the circulant phase retrieval problem and to recover (up to certain fundamental ambiguities, see Proposition 3), whenever and a certain technical condition holds (Condition 10 in Section 3.2). We note that this technical condition was observed to hold in all conducted numerical experiments, suggesting that it is non-restrictive in practice. While our main result asserts that recovery of a low-rank is possible from the power spectrum and the trispectrum alone (see Theorem 5), we show that this not the case for a full-rank (see Proposition 6).
1.3.2 Practical estimation procedures with theoretical guarantees
In Section 3, we describe a statistically-consistent procedure for estimating from finite-sample estimates of the power spectrum and the trispectrum. This is essentially a two step procedure, where the first step is to estimate up to the aforementioned âdiagonal phase ambiguitiesâ, and the second step is to resolve them while exploiting the fact that is Hermitian and PSD. The first step is derived in Section 3.1, see Algorithm 1, and consists of solving a convex optimization problem, followed by computing several rank-one decompositions. We prove that regardless of the rank , this procedure consistently estimates as up to an element-wise multiplication with a circulant phase matrix, see Theorem 7. We also provide an appropriate modification of the above-mentioned procedure to handle the real-valued case (, , and ), where the difference lies only in the objective function of the convex optimization problem. We remark that only the first step of our two-step procedure needs to be modified to handle the real-valued case. The second step of the recovery process, i.e. resolving the diagonal ambiguities, is derived in Section 3.2. Our main contribution in this context, is a polynomial-time procedure for solving the circulant phase retrieval problem (Problem 2 in Section 2) when and Condition 10 holds, see Algorithm 2 in Section 3.2. Fundamentally, this procedure begins by constructing a certain matrix from the output of step one, and proceeds with evaluating its right singular vector corresponding to its smallest singular value. When combined with Algorithm 1, Algorithm 2 allows for a consistent estimate of as , see Theorem 13. In Figure 2 we demonstrate the results of applying Algorithms 1 and 2 to estimate and of Figure 1, for . Last, in Section 4 we conduct extensive numerical experiments, corroborating the statistical consistency of our estimators and moreover, demonstrating their favorable properties, such as a squared-error convergence rate and robustness to high levels of noise.
2 Invariant moments and identifiability of
We start by introducing the shift-invariant moments used to recover , and provide certain necessary and sufficient conditions for recovery. Instead of working with the model (1) directly, we consider a more convenient and equivalent formulation in the Fourier domain, where cyclic shifts are replaced by modulations. Let be the unitary Discrete Fourier Transform (DFT) matrix, and let be the âth DFT vector, given by
[TABLE]
We denote the Fourier transforms of the quantities in (1) and (3) by
[TABLE]
Then, a formulation equivalent to (1)â(3) in the Fourier domain is
[TABLE]
for , where , are orthonormal (since is unitary), and is the parameter of the cyclic shift from (1). Correspondingly, the covariance matrix of is given by
[TABLE]
which is Hermitian and positive semidefinite (PSD), with rank and with eigenvalues and eigenvectors and , respectively. Clearly, knowing is equivalent to knowing , and so from this point onward we focus on the recovery of instead of . Throughout this section, we assume the noiseless case, i.e. , as the existence of noise simply adds a known bias term to which is easily removed (see Section 3.1 and Appendix E), and has no influence on the issue of identifiability of the solution.
Let us consider the second and fourth moments of , denoted and respectively, and given by
[TABLE]
for , where denotes complex-conjugation. It is important to mention that all odd-ordered moments of vanish, since the âs of (3) admit a zero-centered symmetric distribution. This explains why we only consider the second and fourth moments of , and not the first and third. Next, we define the following subsets of and :
[TABLE]
is known as the power spectrum of , and is known as the trispectrum of  [13]. A fundamental property of and is that their entries are invariant to cyclic shifts of (or equivalently, to integer modulations of ), regardless of the distribution of the cyclic shifts (this can be easily verified by substituting into (10) and (11), where from (5)). Moreover, if admits uniformly distributed cyclic shifts, then most of the entries in and vanish, and the only non-zero entries of and are given by and , respectively.
Since and are invariant to cyclic shifts in , and as (see (1) with ), and can be viewed as computed directly from and (defined by replacing with in (9)) instead of and , respectively. Furthermore, as is normally-distributed, all moments of can be described in terms of its first and second moments, that is, its mean and covariance. Since the mean of is zero, and can be described solely in terms of . In particular, we have the following proposition providing the explicit forms of and .
Proposition 1** (Explicit form of and ).**
Consider the noiseless case (i.e. ). Then,
[TABLE]
for all .
The proof is provided in Appendix A.
Assuming we have access to the shift-invariant moments and , noting that they can be estimated in a straightforward manner from the observations of (see (22) and (23) in Section 3.1), we turn to address the question of whether can be identified, and under which conditions, from and . To that end, we consider the set of equations
[TABLE]
where represents the unknown covariance matrix, and , are maps encoding the relation between the underlying covariance and the observed shift-invariant moments and . Specifically, and in accordance with (12) and (13), we define and as
[TABLE]
for all . We mention that the domains of and are arbitrary matrices (instead of only Hermitian and PSD matrices) to simplify the analysis in this section. This simplification is achieved by first characterizing the solutions to (14) for arbitrary matrices, and then restricting our attention to the subset of solutions which are Hermitian and PSD.
Given and , (14) corresponds to a set of non-linear equations which are to be solved to determine , where is a linear map in , and is a quadratic map in . According to (12), is merely the main diagonal of , and hence insufficient for recovering . However, provides additional equations, which is more than the number of variables in a generic covariance matrix, and hence possibly enough for recovering . Let us denote by a circulant matrix constructed from a vector , namely
[TABLE]
for . The following lemma characterizes the set of all matrices satisfying the equations (14).
Lemma 2** (Solutions of the moments equations).**
Suppose that for all . Then, a matrix satisfies the set of equations (14) if and only if
[TABLE]
where but are otherwise arbitrary, and is the Hadamard product (entry-wise multiplication).
The proof of Lemma 2 is provided in Appendix B. Essentially, Lemma 2 asserts that a solution to (14) is equal to the true covariance up to a circulant matrix of unknown phases. That is, each diagonal of with circulant wrapping (i.e. the entries for the âth diagonal) in (18) is multiplied by an unknown phase factor of the form . In accordance with Lemma 2, in Section 3.1 we describe a procedure, based on convex optimization followed by a rank-one decomposition, which solves (14) and outputs a statistically consistent estimate (as ) for a matrix with unknown angles . See Algorithm 1 for a summary of the procedure, and Theorem 7 for its consistency guarantee.
At this point, it is important to note that the problem of recovering under the model (7) admits an inherent ambiguity. Clearly, cyclically shifting the signal results in a covariance matrix whose rows and columns are cyclically shifted, while and remain unchanged. In the Fourier domain, where is replaced with , this ambiguity corresponds to integer modulations of the rows and columns of . In particular, let be a set of matrices given by
[TABLE]
where is the âth DFT vector defined in (5). The set is a set of ambiguities associated with the MRFA problem, since each is the covariance matrix of (i.e. the Fourier transform of the signal cyclically shifted by ) for some cyclic shift (see (1)â(3) and (5)â(7)). Consequently, is a set of ambiguities inherent in the recovery of from the shift-invariant moments and , as established by the following proposition.
Proposition 3** (Fundamental ambiguities).**
Every matrix is Hermitian, PSD, has rank , and satisfies the equations in (14).
Proof.
The matrices (for every ) are Hermitian since is Hermitian. They are also PSD with rank since they are similar to (and hence share their eigenvalues with ). Last, observe that
[TABLE]
and hence Lemma 2 establishes that the matrix satisfies the equations in (14). â
Henceforth, whenever we refer to the recovery of , we essentially mean the recovery of any (arbitrary) element from the set .
Evidently, the set of equations (14) goes a long way in narrowing down the set of feasible covariance matrices, as solving (14) leaves us with only unknown parameters , which are to be determined in order to recover . This leads us to consider the following problem.
Problem 2** (Circulant phase retrieval).**
Given with unknown angles , determine (or any arbitrary element from of (19)).
In a way, Problem 2 can be viewed as a certain phase retrieval problem, where the phases multiplying each diagonal of (with circulant wrapping) are to be retrieved, hence the name âcirculant phase retrievalâ. In this regard, note that Lemma 2 considers a general matrix , and ignores the fact that we actually seek a matrix which is Hermitian and PSD, which are properties satisfied by the true covariance matrix . Without any further prior knowledge on (not even its rank), a natural way to go about solving Problem 2 is to try to solve the following surrogate problem.
Problem 3**.**
Given with unknown angles , find angles such that is Hermitian and PSD.
Suppose that we are able to solve Problem 3, then a fundamental question is whether any solving Problem 3 is also a solution to Problem 2, i.e. whether is in the set of feasible solutions . It turns out that for certain which are sufficiently low-rank, any solving Problem 3 is indeed also a solution to Problem 2. In particular, we establish that this is true if under mild conditions on , or if and satisfies a certain technical condition (Condition 10 in Section 3.2). These results are summarized by the next Lemma.
Lemma 4**.**
Suppose that for all , and either , or and Condition 10 holds. Then, if is a solution to Problem 3 (i.e. is Hermitian and PSD, where is as described in Problem 3) it is also a solution to Problem 2 (i.e. ).
The proof of Lemma 4 is provided in Appendix C. In Section 3.2 we outline a polynomial-time procedure for solving Problem 3, which is guaranteed to succeed if and Condition 10 holds, see Algorithm 2 (we note that even though Condition 10 is not required for the claim of Lemma 4 in the case of , we do require it for our guarantees on the success of Algorithm 2). We mention that Condition 10 arises naturally from the procedure described in Section 3.2. While this condition is very technical and somewhat opaque, it can be easily tested for any using the singular values of a certain matrix whose construction is detailed in Section 3.2. Moreover, Condition 10 was observed to hold in all numerical experiments conducted in Section 4.
The following theorem is the main result concerning the recovery of low-rank covariance matrices from and .
Theorem 5** (Low-rank recovery of ).**
Suppose that for all . If , or if and Condition 10 holds, then (or any arbitrary element from ) can be recovered from and . Specifically, if is Hermitian, PSD, and satisfies equations (14), then .
The proof of Theorem 5 follows immediately from combining Lemma 2 with Lemma 4. Evidently, Theorem 5 together with Proposition 3 assert that a matrix is Hermitian, PSD, and satisfies equations (14) if and only if .
Coupling the procedure for estimating up to diagonal phase ambiguities (Algorithm 1) with the procedure for resolving them (Algorithm 2), we obtain a statistically consistent procedure for recovering (see Theorem 13), which has polynomial-time complexity.
Now, while low-rank covariance matrices can be successfully recovered from and , this is not the case for full-rank , as shown by the following proposition.
Proposition 6** (Full-rank ).**
Suppose that for all . If , then cannot be recovered from only and . That is, there exists a Hermitian and positive definite matrix , with , which satisfies equations (14).
The proof of Proposition 6 is provided in Appendix D.
3 Recovering the covariance matrix
In this section we describe our algorithms for estimating a low-rank (and consequently the covariance matrix of , i.e. ) using observations from the model (1)â(3), with an arbitrary noise variance . We also provide appropriate statistical consistency guarantees for these algorithms.
3.1 Step 1: Recovering up to diagonal phase ambiguities
Given observations drawn from the model (1)â(3), we first compute their Fourier transforms
[TABLE]
where is the DFT matrix from (5). Next, we estimate and via
[TABLE]
for . Evidently, and are unbiased and consistent estimators for and , respectively [40]. We then proceed by constructing estimators for the diagonals of using and (as shown below), and prove that they are statistically consistent as up to arbitrary phase factors (i.e., multiplicative constants with unit magnitude).
We now describe our estimation procedure in detail. Let us denote by a column vector given by the âth diagonal (with circulant wrapping) of the matrix , i.e.
[TABLE]
for . In Appendix E we show that and can be expressed in terms of as
[TABLE]
for every . Next, we define the matrices , , by
[TABLE]
noting that with , and rewrite (26) using (27) as
[TABLE]
We next estimate the matrices , for , by solving an optimization problem that fits from (23) to the form (28) (replacing with its estimate ), while removing the rank constraint on . Specifically, we solve
[TABLE]
which is a linear least-squares problem with semidefinite constraints, hence a convex optimization problem readily solved by a variety of algorithms [18, 11]. Even though we omitted the rank constraint on when solving (29), the resulting estimates approximate the matrices (as established in the proof of Theorem 7 below) and are close to being rank-one, as exemplified in Figure 3 for , , , and .
Then, each diagonal for is estimated from the best rank-one approximation to . In particular, if is the largest eigenvalue of and is its corresponding eigenvector, then we estimate via
[TABLE]
noting that is unique up to a phase factor, i.e. a constant multiplying , where is an (unknown) angle. Note that according to (25), the main diagonal of , namely , can be estimated directly from and hence does not suffer from this phase ambiguity. We therefore proceed by forming the matrix , given by
[TABLE]
where the subtraction of from the main diagonal of corrects for the bias due to noise. The following theorem establishes that is a statistically-consistent estimate of as , up to an unknown circulant phase matrix.
Theorem 7** (Consistency of (31)).**
Suppose that for all . Then,
[TABLE]
The proof of Theorem 7 is provided in Appendix G.
Now, from a practical standpoint, it is worthwhile to briefly consider the real-valued case, where , , and . The only difference between the real-valued case and the complex-valued case lies in the expression for the trispectrum , which now admits an additional additive term. In particular, we show in Appendix F that instead of (28) we have
[TABLE]
Therefore, analogously to (29), we propose to solve
[TABLE]
and proceed with the estimation of and the construction of as in the complex-valued case. Due to the additional term in the expression for the trispectrum in (33), the proof of the analogue of Theorem 7 in the real-valued case is somewhat more complicated, and is left for a future work. Nonetheless, we demonstrate by numerical experiments in Section 4 that the proposed approach for the real-valued case provides results that are very similar to the complex-valued case.
The algorithm for recovering up to unknown diagonal phase ambiguities, for both the complex-valued and the real-valued cases, is described in Algorithm 1.
Remark 1**.**
It is worthwhile to point out that problem (29) is ill-posed without the semidefinite constraints. Removing the semidefinite constraints in (29) results in a linear least-squares system with equations, and a smaller number of variables (due to the constraint ), which is not underdetermined. Yet, we observe that for every triplet of indices there exists another triplet which results in exactly the same equation as for the first triplet (since the terms and in (29) interchange). Therefore, the number of independent equations is actually smaller than the number of variables and the problem is ill-posed. Yet, it turns out that the semidefinite constraints resolve this ill-posedness, as established in the proof of Theorem 7.
Remark 2**.**
We mention that the trispectrum admits several symmetries which can be exploited to reduce the computational burden of Algorithm 1. Notice from (11) that swapping the first and third, or second and fourth indices of does not change the value of . Therefore, it is clear that
[TABLE]
hence it is sufficient to estimate only about a quarter of the elements of .
3.2 Step 2: Resolving the diagonal phase ambiguities
We consider an estimator for of the form
[TABLE]
where is from (31). In this section, we derive a procedure to find the angles such that is close to being Hermitian and PSD. For simplicity of presentation, we derive the procedure in the limiting case of . Specifically, we consider the setting of Problem 3, where we assume that we have access to the matrix , given by
[TABLE]
with unknown angles , and seek angles such that the matrix
[TABLE]
is Hermitian and PSD.
Let us define the matrices , for , by
[TABLE]
where is the operation of cyclically shifting the rows and columns of by and , respectively, namely
[TABLE]
The following lemma summarizes several properties of required for our derivation.
Lemma 8**.**
The matrix (taking in (39)) is given explicitly by
[TABLE]
and is Hermitian, PSD, and satisfies
[TABLE]
for every .
The proof of Lemma 8 is provided in Appendix H. Next, using (38), we define the matrix via its blocks as
[TABLE]
where denotes the âth block of , and
[TABLE]
We have the following lemma regarding the matrix of (43).
Lemma 9**.**
If of (38) is Hermitian and PSD, then of (43) is also Hermitian and PSD.
The proof of Lemma 9 is provided in Appendix I. From Lemma 9 it follows that for appropriate angles such that is Hermitian and PSD, is also Hermitian and PSD, and we can write
[TABLE]
for some , where denotes the âth block of (the consecutive rows of starting from row number ). From (45) and (43), we have that , which implies that the columns of are spanned by the eigenvectors of . Following (42), we define to be the matrix whose columns are the eigenvectors of which correspond to its largest eigenvalues. Then, we can write
[TABLE]
where is a matrix of unknown coefficients. Now, using (46), (45), and (43) we have that
[TABLE]
where we defined to be a matrix of unknown coefficients. Importantly, fixing and , (47) describes a system of linear equations in the variables and the variables , where we relaxed the requirement that have unit norm (we will see that this relaxation still enables us to obtain the correct angles ). Recall that the matrices and are computed from the matrix , which is provided to us (or estimated from the data, e.g. from Section 3.1). Hence, in total, (47) describes a linear system with equations in variables, among which the variables encode the required correcting angles from (38). Now, even though it is possible to exploit (47) directly to solve the problem at hand (identifying the phases ), we proceed by forming an augmented linear system with more equations compared to the number of variables, which ultimately allows to recover for larger ranks . To this end, we couple together all systems of equations from (47) for all such that , noting that are shared by all such systems. We then obtain the set of equations
[TABLE]
which is a system of equations in variables. Continuing, we can write the linear system of (48) in standard matrix notation as
[TABLE]
where is a column vector of zeros, is a column vector of variables formed by stacking on top of all of the elements in , and the matrix is constructed from (48) as follows. Let and , for , be given by
[TABLE]
where is the Kronecker product, is the âth indicator vector (with a single value of at the âth entry), is the operation of vectorizing a matrix by stacking its columns on top of one another (with the leftmost column being at the top of the resulting vector), and recall that is the matrix whose colums are the first eigenvectors of (corresponding to its largest eigenvalues). Then, is given by
[TABLE]
where stands for a block-diagonal matrix constructed from the matrices , namely a matrix of size with non-zero blocks along its main diagonal, each of size . Figure 4a depicts the structure of a typical matrix .
Next, note that is a possible solution to (49), where is the column vector of zeros. However, we know that there must exist at least one additional nonzero solution corresponding to the true phase ambiguities ( is one such solution). Therefore, the linear system of (49) must admit an infinite number of solutions. This implies that if the system in (49) is not underdetermined (as we enforce next), then the smallest singular value of must be zero. Note that the system in (49) is not underdetermined if we require that , which is equivalent to requiring (since and are integers). In order to proceed, we need the following condition.
Condition 10**.**
The second-smallest singular value of is strictly positive.
Condition 10 can be easily tested by computing the singular-value decomposition (SVD) of constructed from . Figure 5a depicts the six smallest singular values of when constructed from a covariance matrix with , eigenvalues , and eigenvectors randomly sampled from the unit sphere (with uniform distribution).
Now, assuming that and Condition 10 holds, then the solution to (49) is the span of the right singular vector of corresponding to its smallest singular value. Denoting this singular vector by , we have that
[TABLE]
for any complex constant . At this point, we briefly mention that a naive evaluation of can be computationally challenging. In this regard, Remark 3 below outlines an efficient approach, which utilizes instead of to evaluate . Continuing with our derivation, from (44), (49), and (53) it follows that
[TABLE]
Hence, the magnitudes of the first elements of should be constant, and their phases should satisfy
[TABLE]
where is the argument of a complex number (), and is an unknown angle (). Figure 6b illustrates the magnitudes of the first elements of , for the same matrix as used in Figure 5, exemplifying the agreement with (54).
Next, note that the set of phase differences for (recalling that ), satisfies
[TABLE]
Therefore, taking the sum over on both sides in (55) yields
[TABLE]
asserting that must satisfy
[TABLE]
for some . From (55), (58), and taking (in accordance with (38)), we arrive at
[TABLE]
for and some (where is fixed for all values of ), which determines every angle completely up to an additive ambiguity of for some . Reviewing the derivation thus far, (59) is a necessary condition for to be Hermitian and PSD (since we derived it from the assumption that is Hermitian and PSD). We summarize the derivation up to this point in the following proposition.
Proposition 11**.**
Suppose that and Condition 10 holds. If of (38) is Hermitian and PSD, then the angles must follow (59) for some fixed .
Although Proposition 11 alone does not guarantee that choosing according to (59) for any particular leads to which is Hermitian and PSD, nor that , those two facts actually follow when combining Proposition 11 with Proposition 3. We then get the following result.
Proposition 12**.**
Suppose that and Condition 10 holds. Then, of the form of (38) is Hermitian and PSD if and only if . Moreover, if the angles follow (59) for any fixed then
[TABLE]
for some , where are from (37), and consequently .
Proof.
Proposition 11 asserts that for to be Hermitian and PSD, the vector of correcting angles must be chosen from the different options corresponding to different in (59) (the options are different since is clearly different for every ). On the other hand, taking according to (60) gives
[TABLE]
asserting (via Proposition 3) that is Hermitian and PSD, for every . Therefore, there are different choices for , given explicitly by (60), that result in which is Hermitian and PSD. Consequently, choosing according to (59) must coincide with (60), with some one-to-one mapping between the values of the indices and , thus establishing all claims of Proposition 12. â
Considering the finite-sample case, it is clear that we do not have access to (of (37)) nor (of (38)). Therefore, we replace and with (of (31)) and (of (36)), respectively, and denote by the corresponding finite-sample analogues of all quantities defined in this section. Figure 5b depicts the behavior of the smallest singular values of (the finite-sample analogue of ), and Figure 6b illustrates the magnitudes of (the right singular vector of corresponding to its smallest singular value). The resulting procedure for resolving the diagonal phase ambiguities in the finite-sample case is detailed in Algorithm 2. We mention that if is unknown, we take it to be the maximal rank allowed by Algorithm 2, which is the largest such that (for the system (49) not to be underdetermined).
The following theorem establishes the consistency of the estimator computed by Algorithm 2.
Theorem 13** (Consistency of Algorithm 2).**
Let be the input to Algorithm 2 (replacing ). Suppose that , Condition 10 holds, and that from (41) has distinct non-zero eigenvalues for every . Then,
[TABLE]
where is the output of Algorithm 2.
The proof of Theorem 13 is provided in Appendix J.
Remark 3** (Efficient evaluation of and ).**
In general, is of size (assuming is unknown), and hence the singular-value decomposition (SVD) of becomes computationally prohibitive even for moderate values of . Therefore, it is essential to compute without the full SVD of , while exploiting the special structure of . In particular, note that since the columns of (the eigenvectors of ) are orthonormal, then also the columns of (see (50)) are orthonormal due to the definition of the Kronecker product. Consequently, the rightmost columns of are orthonormal due to the block-diagonal structure in (52). It then follows that the matrix is much sparser than , see Figure 4b, as it includes only nonzero elements (compared to in ). As is also the eigenvector of corresponding to its smallest eigenvalue, can be computed efficiently by the inverse power method using the conjugate-gradients algorithm for inverting at each iteration. The above discussion applies equivalently to the evaluation of from .
4 Numerical experiments
Next, we report our experimental findings on the recovery of from the measurements , using Algorithm 1 followed by Algorithm 2. We use the following measure of discrepancy to evaluate the estimation error:
[TABLE]
where is the estimator of obtained from the output of Algorithm 2 while treating the rank as unknown (setting per step 2 in Algorithm 2). In all our experiments, we generate the covariance matrix randomly as follows. We sample the eigenvalues of uniformly from , and normalize them such that , essentially enforcing a fixed signal power of (i.e. ). The eigenvectors of are sampled uniformly from the -sphere. After generating the covariance matrix , the observations are drawn according to (1) and (3) using a uniform distribution for the cyclic shifts .
We begin with several experiments testing the performance of our algorithms in the complex-valued case (, and ). We first explore the ability of our algorithms to recover for different ranks and signal lengths . For these experiments, we use , and consider the maximal error among trials. The results are shown in Figure 7. As supported by Theorem 13, small estimation errors are always achieved when . However, since the algorithm cannot handle the case of (the linear system in (49) becomes underdetermined), the worst-case estimation errors rapidly increase with for .
Continuing, we investigate the behavior of the estimation error as a function of the number of observations . In this experiment, the error is averaged over trials. Figure 8 displays the estimation error of Algorithm 2 as a function of , for , and . As expected from Theorem 13, the error decreases with for all fixed values of and . In this regard, the empirical results suggest that the error is proportional to . Furthermore, it is evident that the existence of noise simply shifts the error curves to the right, such that more observations (by a constant factor) are required to achieve the same estimation error as without noise. Note that in this example, corresponds to a noise magnitude (given by ) approximately equal to the signalâs strength (normalized to be ). Even in this challenging regime, an accurate estimation of is achieved when , implying that our method can successfully cope with high levels of noise. Also, as evident from Figure 8, the estimation error grows with . This is due to the fact that the fourth moment of (and the trispectrum in particular) is harder to estimate for larger ranks, since the variability in the quantities increases with .
Last, we demonstrate the performance of our algorithms for the real-valued case (, and ), and we note that the difference in our algorithms between the real-valued and the complex-valued cases lies only in step 4 of Algorithm 1 (as solving (29) in the complex-valued case is replaced by solving (34) in the real-valued case). The results are shown in Figures 9 and 10, which are analogous to Figures 7 and 8 in the complex-valued case, respectively. It is evident that the performance of our algorithms in the real-valued case is very similar to that of the complex-valued case, with almost identical behavior in all aspects, albeit slightly larger estimation errors.
5 Summary and discussion
In this work, we considered the problem of recovering the covariance matrix of a random signal observed through unknown translations and corrupted by noise, where the signal is low-rank (i.e. it follows the factor model (3) with much smaller than ). We have shown that unique recovery of the covariance matrix is possible (up to a set of fundamental ambiguities, see Proposition 3) when and Condition 10 holds. We provided statistically-consistent polynomial-time estimation procedures, and concluded with numerical simulations corroborating our theoretical findings.
There are many open questions emerging from this work, giving rise to several possible future research directions. First, we discuss future research directions associated with the model (1)â(3). While we have shown that recovery of the covariance matrix from the power spectrum and the trispectrum is possible when and impossible when (i.e. the covariance is full-rank), it is of interest to determine tighter upper and lower bounds on the rank characterizing when the recovery can be attained (both theoretically and using polynomial-time algorithms), possibly determining the exact phase transition, namely the set of ranks above which recovery is no longer possible. Moreover, even when covariance estimation from the power spectrum and the trispectrum alone is impossible, it is of interest to investigate the advantages of adding higher-order moments. Last, while we established statistical consistency for our estimators, it is favorable to investigate their estimation errors in terms of the quantities governing our model (, , , and ).
Aside from the above-mentioned research directions associated with the model (1)â(3), it is worthwhile to consider various extensions of this model. First, one could replace the normal distribution with a broader family of distributions, allowing for more general factor models. Second, the one-dimensional setting (implicitly assumed in (1)â(3)) could be extended to higher dimensions, with other group actions replacing the cyclic shift in (2). For example, one-dimensional signals could be replaced with two-dimensional images, where cyclic shifts are replaced with in-plane rotations. This extension could have important applications in rotation-invariant processing of datasets of two-dimensional images, see for example [44, 25, 26, 29].
Acknowledgements
We would like to thank Nicolas Boumal for several enlightening conversations, and to Boaz Nadler for useful comments and suggestions. This research was supported by the European Research Council (ERC) under the European Unionâs Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH).
Appendix A Proof of Proposition 1
The expression for follows directly from its definition (10). We now prove (13). Towards this end, we use existing results on the moments of the complex normal distribution (see [35]). However, since is not invertible when , we cannot claim that . Therefore, we first treat the case of , and then extend our result to the case of by a continuity argument.
Suppose that , i.e. for all . Then , and according to Theorem 5 in [35] we have
[TABLE]
where is the Kronecker product, and is the commutation matrix, given by
[TABLE]
for , and . Note that we can write (defined by replacing in (9) with ) as
[TABLE]
and it follows from (64) and (65) that
[TABLE]
where we used the fact that is Hermitian. Therefore, by substituting we get
[TABLE]
Last, we extend the result above to the case of . It is easy to verify that is continuous in , since is a subset of
[TABLE]
which is a polynomial in (since ) for all values of (including zero). Therefore, fixing and taking for on both sides of (68), we get due to the continuity of that (68) also holds for any (where for ).
Appendix B Proof of Lemma 2
For it is convenient to define as
[TABLE]
for . In other words, is the âth diagonal of with circulant wrapping, and is analogous to of (24) for (which is the âth diagonal of with circulant wrapping). By (70), (15), and (16), the set of equations (14) can be written as
[TABLE]
By taking (with some abuse of notation) , , we can rewrite (72) more conveniently (and analogously to (26)) as
[TABLE]
for (noting that and in (73) are not equivalent to and in (72)). Now, for the âifâ part of the âif and only ifâ statement of Lemma 2, it is straightforward to verify that taking according to (18), namely with , satisfies both (71) and (73) (substituting (12) and (13) into (71) and (73)) since the terms cancel out. We now consider the other direction, namely the âonly ifâ part of the statement. Suppose that the set of equations in (14) hold. We now prove the required result by the following three steps. First, taking in (73) and substituting (71) gives
[TABLE]
which establishes (by substituting (12) and (13)) that
[TABLE]
for every . Second, taking and in (73) leads to
[TABLE]
for . Substituting (13) in the above equation, we have
[TABLE]
Now, taking and in (75) establishes that with some . Then,  (77) determines completely for all by an iterative procedure, as is obtained from , is obtained from , and so on, where each element is obtained by dividing both sides of (77) by (we never divide by zero from the assumption in Lemma 2 that ). Consequently, we have that
[TABLE]
Last, taking in (73) gives
[TABLE]
and substituting (13) together with (78) establishes that
[TABLE]
Repeating our previous argumentation (for ) now for every , we take and in (75), which establishes that with some , and then (80) determines for every , by the previously mentioned iterative process. Therefore, we have that
[TABLE]
for all and , which concludes the proof.
Appendix C Proof of Lemma 4
We begin with the case of . Note that
[TABLE]
Let us define
[TABLE]
for , and it follows that we can write as
[TABLE]
Suppose that solves Problem 3, namely is Hermitian and PSD. Recall that the inertia of a Hermitian matrix is the triplet describing the number of zero, positive, and negative eigenvalues, respectively, of . Since is Hermitian and PSD, all of its eigenvalues are non-negative, hence , and by Sylvesterâs law of inertia, the matrix
[TABLE]
is also Hermitian and PSD, since it preserves the inertia of (where is well-defined since from the assumptions of Lemma 4). Now, it is well-known that a circulant matrix can be diagonalized by the DFT matrix (5), and in particular, we can write
[TABLE]
where is the âth DFT vector defined in (5), and are the eigenvalues of , which are non-negative as shown in (85). We now prove that are all non-negative only if they are all zero except for one of them. Note that
[TABLE]
and therefore
[TABLE]
When combining both of the above equations (in particular, squaring both sides of the left equation and subtracting the right equation), we have that
[TABLE]
with for all . It then immediately follows that for some single while for all , since otherwise for some , which is a contradiction to (89). Consequently, we have that for some (see the left equation in (88)), and
[TABLE]
which implies that (see also (20)), and hence solves Problem 2.
Last, for the case of under Condition 10, we refer the reader to the derivation in Section 3.2, which provides a complete proof for this case through the derivation of the procedure for solving Problem 3, and whose results are summarized in Proposition 12.
Appendix D Proof of Proposition 6
Let us take as
[TABLE]
More specifically, we have
[TABLE]
Clearly, follows the form of (18) in Lemma 2, hence satisfies the equations
[TABLE]
Moreover, is Hermitian, and
[TABLE]
Since the eigenvalues of a square matrix depend continuously on its elements (theorem 2.4.9.2 in [20]), we also have that
[TABLE]
where stands for the smallest eigenvalue of . Because when , there exists a sufficiently small such that if then , and consequently is PSD. However, it is evident that , which concludes the proof.
Appendix E Justification of (25) and (26)
Since we want to account for an arbitrary noise variance , whereas Proposition 1 considers explicitly the noiseless case , we introduce a certain update which places us in the noiseless setting and allows us to use Proposition 1. Note that according to the definition of in (1), admits the same distribution as (since the distribution of the noise is invariant to the operation ), and consequently, from (7) admits the same distribution as . Therefore, we can absorb the noise variance into the main diagonal of . That is, with some abuse of notation, we update according to
[TABLE]
and then omit the noise vector (from (7)) entirely. This update places us in the noiseless setting of in (7) (after fixing according to (96)) where the power spectrum and the trispectrum are determined solely by according to Proposition 1. Then, taking according to (24) and applying Proposition 1 gives (25) and (26).
Appendix F The trispectrum for the real-valued case
This proof follows very closely with the proof in Appendix A. Analogously to the proof in Appendix A, we first consider the case of (i.e. for all ) so we may claim that is normally-distributed and use standard results on the moments of the normal distribution. We then extend our result to any by a continuity argument. Consider the case of . Then, , and according to Isserlisâ formula [23] (for computing the moments of the zero-mean multivariate normal distribution) we have that
[TABLE]
where we used the fact that is real-valued, hence and thus . Therefore, it follows that
[TABLE]
where we used the observation that for any , since is Hermitian. Taking according to (24), and using in (98) gives
[TABLE]
where we used the fact that is Hermitian. Last, a continuity argument (repeating the argument at the end of Appendix A) extends the above result to the case of an arbitrary .
Appendix G Proof of Theorem 7
The following lemma establishes that when and (i.e. when ), then (29) admits a unique minimizer, which is equal to (27).
Lemma 14**.**
Suppose that , , and assume that for all . If is a minimizer of (29), then for .
Proof.
Since (29) is convex, all minimizers attain the same objective value, which is zero since is a minimizer. Therefore, we have that
[TABLE]
for all indices . Considering the case , and observing that (since (29) enforces ), we have
[TABLE]
Hence, the main diagonal of is equal to the main diagonal of , for . Next, consider the case , for which we get from (100)
[TABLE]
Therefore, we conclude that the âth diagonal (with circulant wrapping) of is equal to the âth diagonal of , for . Up to this point, we established that and agree on two diagonals (their main diagonal and their âth diagonal) for every . Now, we turn to show that if , then (101) and (102) imply that (i.e. and agree on all diagonals). Let us define the matrix
[TABLE]
where is from (24), and (103) is well defined since for all from the assumptions of the lemma. Since is a minimizer of (29) then , and by (103) also (due to Sylvesterâs Inertia theorem). Then, since , and the fact that and have the same values on their main and âth diagonals, it follows that
[TABLE]
for all indices . Now, since is positive semidefinite with unit diagonal, it can take the role of a correlation matrix of a random vector. In particular, let us consider a random vector , with for , noting that
[TABLE]
Fixing , the above relations imply that and are perfectly correlated normal variables with unit variances, and hence linearly dependent (almost surely) with
[TABLE]
for (since . Therefore, it follows that (almost surely), and must be of rank one with
[TABLE]
where denotes an vector of ones. From (107) and (103) it then follows that
[TABLE]
Using the above result for together with (100) provides us with an additional equation on the diagonals of , namely
[TABLE]
and hence, using (103) again,
[TABLE]
for and . Therefore, we established that and agree on their main and first diagonals for every . We can now repeat our previous arguments of the case of (using ) for , resulting in
[TABLE]
for and , almost surely. Therefore,
[TABLE]
and consequently
[TABLE]
for all . â
The next lemma establishes that problem (29) is robust to errors in the estimation of and .
Lemma 15** (Stability of (29)).**
Suppose that for all . If and (element-wise), then
[TABLE]
for .
Proof.
For convenience, we formulate a problem equivalent to (29) in matrix-vector notation. Let , , , , , be vectors obtained from vectorizing , , , , , and , respectively. Then, the set of equations (28)
[TABLE]
for , can be written in matrix form as
[TABLE]
where , are suitable matrices (whose exact expressions are not important for this proof). Next, we define the following functions:
[TABLE]
for obtained by vectorizing from (29), hence satisfying the semidefinite constraints associated with for . Recall from (29) that is a minimizer of . Therefore,
[TABLE]
which together with (116) gives
[TABLE]
On the other hand, by the reverse triangle inequality it follows that
[TABLE]
and by combining (121) and (120) we have
[TABLE]
Therefore, if , , we have that , , and thus
[TABLE]
Last, since is a non-negative and convex function over a convex domain with a unique minimizer (Lemma 14), it follows from (123) that (see Corollary 27.2.2 in [33])
[TABLE]
or equivalently
[TABLE]
â
Since and (of (22) and (23)) are consistent estimators for and , respectively, we have that
[TABLE]
Therefore, by Lemma 15 it follows that
[TABLE]
for . Note that is of rank one, with leading eigenvector and leading eigenvalue
[TABLE]
Recall from (30) that
[TABLE]
where is the leading eigenvalue of and is its corresponding eigenvector. Classical results in matrix perturbation theorey establish that converges to almost surely. In particular, the Davis-Kahan theorem [43, 15] asserts that
[TABLE]
for some , where is the leading eigenvector of , and by Weyl [42]
[TABLE]
where is the largest eigenvalue of (corresponding to ). Hence, by (127), (130) and (131) it follows that
[TABLE]
for . Last, by the definition of in (31)
[TABLE]
where we used the fact that (see (24) and (25)), (132), and the fact that , which concludes the proof.
Appendix H Proof of Lemma 8
First, (41) follows from (37) since a circulant matrix is invariant to (for any ), hence
[TABLE]
where is a matrix of ones. Second, the fact that is Hermitian follows from
[TABLE]
where we used (41) and (40). Third, by (41), is PSD since the Hadamard product of two PSD matrices is also PSD due to the Schur product theorem (see Theorem 5.2.1 in [34]). Last, (42) is due to a well-known bound on the rank of the Hadamard product (see Theorem 5.1.7 in [34]).
Appendix I Proof of Lemma 9
By the definition of the blocks of in (43), we have that
[TABLE]
It is easy to verify from (136) that is Hermitian if is Hermitian (this follows immediately from interchanging with , and with ). Next, a key observation for this proof is that is similar to the matrix , where is the Kronecker product. This is due to the fact that
[TABLE]
and hence can be transformed into by an appropriate permutation of its rows and columns. Specifically, we can write
[TABLE]
and it follows that there exists a permutation matrix such that
[TABLE]
It is well-known that the eigenvalues of are given by the pair-wise products between the eigenvalues of and the eigenvalues of (see [34]). Hence, if is Hermitian and PSD, then is also Hermitian and PSD, and consequently so is by its similarity to .
Appendix J Proof of Theorem 13
By Theorem 7, we can write
[TABLE]
where is equal to from (37) but with angles that may depend on , and . For simplicity of presentation, we omit the superscript in and from all subsequent derivations. Let of (52) be the matrix constructed from as described in Section 3.2, and let be a matrix analogous to when using instead of . We now analyze the different quantities involved in the construction of . By (39) we have that
[TABLE]
Using the bound (see [34]), it follows that
[TABLE]
where we used the fact that . Since admits distinct and non-zero eigenvalues (see the assumptions of Theorem 13), we have from the Davis-Kahan Theorem [43, 15] that
[TABLE]
where and are matrices whose columns are the first eigenvectors (corresponding to the largest eigenvalues) of and , respectively. Next, define the matrices and the vectors , for , analogously to and of (51), by
[TABLE]
where is the Kronecker product, is the âth indicator vector (with a single value of at the âth entry), and is the operation of vectorizing a matrix by stacking its columns on top of each other. From (143), we can write
[TABLE]
where (and the angles depend on ). Therefore, we can write
[TABLE]
where we used the mixed-product property of the Kronecker product (i.e. , see [34]). By using the bound (see [34]) together with , we have that
[TABLE]
for . Next, from (142) and (144) it immediately follows that
[TABLE]
for . Recall that is formed according to the right-hand side of (52) when replacing and with and , respectively. Therefore, by (147) and (148) it follows that
[TABLE]
where is a column vector of ones. Recall that and are the right singular vectors of and corresponding to their smallest singular values, respectively. Let and be the first elements of and , respectively. Note that the matrices and agree in their singular values and in the first entries of their singular vectors. Therefore, from (149) it follows that
[TABLE]
where we used the Davis-Kahan Theorem [43, 15] together with the fact that the smallest singular value of is zero while its second-smallest singular value is strictly positive (resulting in a spectral gap for the smallest singular value). Since the elements of are bounded away from zero (they have magnitudes of according to (54)), from (150) it follows that
[TABLE]
Let be analogous to from (59) when replacing with , and fixing , i.e.
[TABLE]
Then, it follows from (152), (59) and (151) that
[TABLE]
for , which together with Proposition 12 implies that
[TABLE]
With some abuse of notation, let be as in (36) with replacing . Then, employing (140) and (37) yields
[TABLE]
Hence, by the above equation together with (154), (20), and the fact that , we have
[TABLE]
and (62) in Theorem 13 follows in a straightforward manner (using ).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Emmanuel Abbe, Tamir Bendory, William Leeb, JoĂŁo Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. ar Xiv preprint ar Xiv:1710.02793 , 2017.
- 2[2] Emmanuel Abbe, JoĂŁo M Pereira, and Amit Singer. Sample complexity of the boolean multireference alignment problem. In Proceedings. IEEE International Symposium on Information Theory , volume 2017, page 1316. NIH Public Access, 2017.
- 3[3] Emmanuel Abbe, JoĂŁo M Pereira, and Amit Singer. Estimation in the group action channel. ar Xiv preprint ar Xiv:1801.04366 , 2018.
- 4[4] Gil Aharoni, Amir Averbuch, Ronald Coifman, and Moshe Israeli. Local cosine transformâa method for the reduction of the blocking effect in jpeg. In Wavelet Theory and Application , pages 7â38. Springer, 1993.
- 5[5] Yariv Aizenbud, Boris Landa, and Yoel Shkolnisky. Rank-one multi-reference factor analysis. ar Xiv preprint ar Xiv:1905.12442 , 2019.
- 6[6] Afonso Bandeira, Philippe Rigollet, and Jonathan Weed. Optimal rates of estimation for multi-reference alignment. ar Xiv preprint ar Xiv:1702.08546 , 2017.
- 7[7] Afonso S Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Weed, and Alexander S Wein. Estimation under group actions: recovering orbits from invariants. ar Xiv preprint ar Xiv:1712.10163 , 2017.
- 8[8] Afonso S Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science , pages 459â470. ACM, 2014.
