Multi-reference factor analysis: low-rank covariance estimation under   unknown translations

Boris Landa; Yoel Shkolnisky

arXiv:1906.00211·math.ST·November 11, 2020

Multi-reference factor analysis: low-rank covariance estimation under unknown translations

Boris Landa, Yoel Shkolnisky

PDF

1 Repo

TL;DR

This paper introduces a method for estimating low-rank covariance matrices of signals observed through unknown cyclic shifts and noise, using shift-invariant moments like the power spectrum and trispectrum, with proven consistency and practical effectiveness.

Contribution

It develops a polynomial-time, statistically consistent procedure for low-rank covariance estimation from translated noisy observations using shift-invariant moments, extending PCA applicability.

Findings

01

Covariance can be recovered from power spectrum and trispectrum when rank is low.

02

The method is statistically consistent and effective even under high noise levels.

03

Full-rank covariance matrices cannot be reliably estimated with this approach.

Abstract

We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length $L$ and follows a standard factor model with mean zero and $r$ normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the $power spectrum$ and the $trispectrum$ . We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when $r < L$ . Correspondingly, we provide a polynomial-time…

Equations336

y = R_{s} {x} + η,

y = R_{s} {x} + η,

R_{s} {x} [ℓ] = x [mod (ℓ - s, L)],

R_{s} {x} [ℓ] = x [mod (ℓ - s, L)],

x = i = 1 \sum r a_{i} v_{i},

x = i = 1 \sum r a_{i} v_{i},

Σ_{x} := E [x x^{*}] = i = 1 \sum r λ_{i} v_{i} v_{i}^{*},

Σ_{x} := E [x x^{*}] = i = 1 \sum r λ_{i} v_{i} v_{i}^{*},

F [ℓ, k] = \frac{1}{L} f_{ℓ} [k], f_{ℓ} [k] = e^{-  2 π k ℓ / L}, ℓ, k = 0, \dots, L - 1.

F [ℓ, k] = \frac{1}{L} f_{ℓ} [k], f_{ℓ} [k] = e^{-  2 π k ℓ / L}, ℓ, k = 0, \dots, L - 1.

\overset{y}{^} \overset{v}{^}_{i} = F y, \overset{x}{^} = F x, \overset{η}{^} = F η, = F v_{i}, i = 1, \dots, r .

\overset{y}{^} \overset{v}{^}_{i} = F y, \overset{x}{^} = F x, \overset{η}{^} = F η, = F v_{i}, i = 1, \dots, r .

\overset{y}{^} [k] \overset{x}{^} [k] = f_{s} [k] \overset{x}{^} [k] + \overset{η}{^} [k], = i = 1 \sum r a_{i} \overset{v}{^}_{i} [k],

\overset{y}{^} [k] \overset{x}{^} [k] = f_{s} [k] \overset{x}{^} [k] + \overset{η}{^} [k], = i = 1 \sum r a_{i} \overset{v}{^}_{i} [k],

\hat{Σ}_{x} := E [\overset{x}{^} \overset{x}{^}^{*}] = F Σ_{x} F^{*} = i = 1 \sum r λ_{i} \overset{v}{^}_{i} \overset{v}{^}_{i}^{*},

\hat{Σ}_{x} := E [\overset{x}{^} \overset{x}{^}^{*}] = F Σ_{x} F^{*} = i = 1 \sum r λ_{i} \overset{v}{^}_{i} \overset{v}{^}_{i}^{*},

M_{\overset{y}{^}}^{(2)} [k_{1}, k_{2}]

M_{\overset{y}{^}}^{(2)} [k_{1}, k_{2}]

M_{\overset{y}{^}}^{(4)} [k_{1}, k_{2}, k_{3}, k_{4}]

P_{y} [k]

P_{y} [k]

T_{y} [k_{1}, k_{2}, k_{3}]

P_{y} [k_{1}]

P_{y} [k_{1}]

T_{y} [k_{1}, k_{2}, k_{3}]

P_{y} = P (X), T_{y} = T (X),

P_{y} = P (X), T_{y} = T (X),

P (X) [k_{1}]

P (X) [k_{1}]

T (X) [k_{1}, k_{2}, k_{3}]

Circulant {z} [k_{1}, k_{2}] = z [mod (k_{2} - k_{1}, L)],

Circulant {z} [k_{1}, k_{2}] = z [mod (k_{2} - k_{1}, L)],

X = \hat{Σ}_{x} ⊙ Circulant {[1, e^{ φ_{1}}, \dots, e^{ φ_{L - 1}}]},

X = \hat{Σ}_{x} ⊙ Circulant {[1, e^{ φ_{1}}, \dots, e^{ φ_{L - 1}}]},

Ω (\hat{Σ}_{x}) = {\hat{Σ}_{x}, diag (f_{1}) \cdot \hat{Σ}_{x} \cdot diag (f_{1}^{*}), \dots, diag (f_{L - 1}) \cdot \hat{Σ}_{x} \cdot diag (f_{L - 1}^{*})},

Ω (\hat{Σ}_{x}) = {\hat{Σ}_{x}, diag (f_{1}) \cdot \hat{Σ}_{x} \cdot diag (f_{1}^{*}), \dots, diag (f_{L - 1}) \cdot \hat{Σ}_{x} \cdot diag (f_{L - 1}^{*})},

diag (f_{ℓ}) \cdot \hat{Σ}_{x} \cdot diag (f_{ℓ}^{*}) = \hat{Σ}_{x} ⊙ f_{ℓ} f_{ℓ}^{*} = \hat{Σ}_{x} ⊙ Circulant {f_{ℓ}^{*}},

diag (f_{ℓ}) \cdot \hat{Σ}_{x} \cdot diag (f_{ℓ}^{*}) = \hat{Σ}_{x} ⊙ f_{ℓ} f_{ℓ}^{*} = \hat{Σ}_{x} ⊙ Circulant {f_{ℓ}^{*}},

\overset{y}{^}_{i} = F y_{i}, i = 1, \dots, N,

\overset{y}{^}_{i} = F y_{i}, i = 1, \dots, N,

P_{y} [k_{1}] = \frac{1}{N} i = 1 \sum N ∣ \overset{y}{^}_{i} [k_{1}] ∣^{2},

P_{y} [k_{1}] = \frac{1}{N} i = 1 \sum N ∣ \overset{y}{^}_{i} [k_{1}] ∣^{2},

T_{y} [k_{1}, k_{2}, k_{3}] = \frac{1}{N} i = 1 \sum N \overset{y}{^}_{i} [k_{1}] \overline{\overset{y}{^}_{i} [k_{2}]} \overset{y}{^}_{i} [k_{3}] \overline{\overset{y}{^}_{i} [k_{1} - k_{2} + k_{3}]},

d_{m} [k] = {\hat{Σ}_{x} [k, mod (k + m, L)], \hat{Σ}_{x} [k, k] + σ^{2}, m \neq = 0, m = 0,

d_{m} [k] = {\hat{Σ}_{x} [k, mod (k + m, L)], \hat{Σ}_{x} [k, k] + σ^{2}, m \neq = 0, m = 0,

P_{y} [k] = d_{0} [k],

P_{y} [k] = d_{0} [k],

T_{y} [k_{1}, k_{1} + m, k_{2} + m] = d_{m} [k_{1}] \overline{d_{m} [k_{2}]} + d_{k_{2} - k_{1}} [k_{1}] \overline{d_{k_{2} - k_{1}} [k_{1} + m]},

G_{m} = d_{m} d_{m}^{*},

G_{m} = d_{m} d_{m}^{*},

T_{y} [k_{1}, k_{1} + m, k_{2} + m] = G_{m} [k_{1}, k_{2}] + G_{k_{2} - k_{1}} [k_{1}, k_{1} + m] .

T_{y} [k_{1}, k_{1} + m, k_{2} + m] = G_{m} [k_{1}, k_{2}] + G_{k_{2} - k_{1}} [k_{1}, k_{1} + m] .

{G_{m}}_{m = 1}^{L - 1} =

{G_{m}}_{m = 1}^{L - 1} =

subject to {G_{m}^{^{'}} ⪰ 0}_{m = 1}^{L - 1}, G_{0}^{^{'}} = P_{y} \cdot P_{y}^{T},

d_{m} = μ_{1}^{(m)} u_{1}^{(m)},

d_{m} = μ_{1}^{(m)} u_{1}^{(m)},

C_{x} [k_{1}, k_{2}] = {d_{m} [k_{1}], P_{y} [k_{1}] - σ^{2}, mod (k_{2} - k_{1}, L) = m, k_{2} = k_{1},

C_{x} [k_{1}, k_{2}] = {d_{m} [k_{1}], P_{y} [k_{1}] - σ^{2}, mod (k_{2} - k_{1}, L) = m, k_{2} = k_{1},

φ_{1}, \dots, φ_{L - 1} \in [0, 2 π) min C_{x} - \hat{Σ}_{x} ⊙ Circulant {[1, e^{ φ_{1}}, \dots, e^{ φ_{L - 1}}]}_{F}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RedCrow9564/EstimationOverGroups-FinalProject
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Multi-Reference Factor Analysis: low-rank covariance estimation under unknown translations

Boris Landa and Yoel Shkolnisky

Boris Landa1,∗ Yoel Shkolnisky2

1Program in Applied Mathematics, Yale University

2Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University

∗Corresponding author. Email: [email protected]

Abstract

We consider the problem of estimating the covariance matrix of a random signal observed through unknown translations (modeled by cyclic shifts) and corrupted by noise. Solving this problem allows to discover low-rank structures masked by the existence of translations (which act as nuisance parameters), with direct application to Principal Components Analysis (PCA). We assume that the underlying signal is of length $L$ and follows a standard factor model with mean zero and $r$ normally-distributed factors. To recover the covariance matrix in this case, we propose to employ the second- and fourth-order shift-invariant moments of the signal known as the power spectrum and the trispectrum. We prove that they are sufficient for recovering the covariance matrix (under a certain technical condition) when $r<\sqrt{L}$ . Correspondingly, we provide a polynomial-time procedure for estimating the covariance matrix from many (translated and noisy) observations, where no explicit knowledge of $r$ is required, and prove the procedure’s statistical consistency. While our results establish that covariance estimation is possible from the power spectrum and the trispectrum for low-rank covariance matrices, we prove that this is not the case for full-rank covariance matrices. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate the favorable performance of our algorithms in various settings, including in high levels of noise.

1 Introduction

Principal Components Analysis (PCA) is a ubiquitous technique in science and engineering, which is used extensively for processing and analyzing large datasets. A standard approach for PCA is to estimate the covariance matrix of the dataset, compute its eigen-decomposition, and then project the data points onto the first several leading eigenvectors (i.e. with the largest corresponding eigenvalues). In various scientific applications, PCA is applied to collections of one-dimensional signals, where the underlying assumption is that these signals are low-rank, in the sense that they reside on (or near) a low-dimensional linear subspace. However, it is often the case that real-world signal measurements are prone to certain group-action deformations, where a common example is that of translations. When different translations are applied to low-rank signals, the resulting covariance matrix loses its low-rank structure (a claim which is made more precise shortly), thus rendering PCA ineffective without first aligning the signals. This scenario, where signals are acquired through unknown translations, is encountered for example in radar target classification [24, 45, 46], chromotographic fingerprinting [14, 30, 36], machine fault diagnosis [17, 28], and ECG signal classification [21, 22]. In these applications, a typical scenario includes collecting a large dataset of signals for analysis and classification, followed by PCA for denoising and dimensionality reduction. For PCA to be effective, the dataset’s covariance matrix should be approximately low-rank. Hence, to account for the different translations it is customary to first align the signals in the dataset. Numerous methods exist for the task of signal alignment, where standard approaches include pair-wise registration, and matching with a predefined template. Yet, it is important to stress that if the signals admit significant heterogeneity (i.e. inherent variability not associated with noise) then alignment is not well-defined, as the concept of aligning two very different patterns is meaningless. Moreover, signal alignment – even between identical copies – cannot be achieved in high levels of noise [9, 41]. Motivated by the above-mentioned limitations of signal alignment, we consider the problem of accurately estimating the covariance matrix of low-rank signals from their translated and noisy observations.

1.1 The setting

We consider the following model for an observed signal $y\in\mathbb{C}^{L}$ :

[TABLE]

where $x\in\mathbb{C}^{L}$ is the underlying signal (to be described shortly), $\eta$ is a noise vector with either $\eta\sim\mathcal{N}(0,\sigma^{2}I_{L})$ or $\eta\sim\mathbb{C}\mathcal{N}(0,\sigma^{2}I_{L})$ ( $\mathbb{C}\mathcal{N}$ stands for the circularly-symmetric complex normal distribution, $I_{L}$ is an $L\times L$ identity matrix), and $R_{s}\left\{\cdot\right\}$ is a discrete cyclic shift by $s$ , i.e.

[TABLE]

with $s$ drawn from some unknown probability distribution over $\mathbb{Z}_{L}$ . In what follows, we consider all vectors as cyclic, and drop the modulus by $L$ from our notation in all index assignments. The underlying signal $x$ is assumed to follow a standard zero-mean factor model

[TABLE]

where $\{v_{i}\}_{i=1}^{r}\in\mathbb{C}^{L}$ are orthonormal, and $\{a_{i}\}_{i=1}^{r}$ are i.i.d with either $a_{i}\sim\mathcal{N}(0,\lambda_{i})$ (if $\eta\sim\mathcal{N}(0,\sigma^{2}I_{L})$ ) or $a_{i}\sim\mathbb{C}\mathcal{N}(0,\lambda_{i})$ (if $\eta\sim\mathbb{C}\mathcal{N}(0,\sigma^{2}I_{L})$ ). We mention that while the theoretical analysis in this work focuses on the complex-valued case (where $a_{i}\sim\mathbb{C}\mathcal{N}(0,\lambda_{i})$ and $\eta\sim\mathbb{C}\mathcal{N}(0,\sigma^{2}I_{L})$ ), for practical purposes we also consider the real-valued case ( $v_{i}\in\mathbb{R}^{L}$ , $a_{i}\sim\mathcal{N}(0,\lambda_{i})$ , and $\eta\sim\mathcal{N}(0,\sigma^{2}I_{L})$ ), providing appropriate modifications to our algorithms (see Section 3.1). Now, given (3), the covariance matrix of $x$ is

[TABLE]

where $(\cdot)^{*}$ stands for complex-conjugate and transpose. While the rank of $\Sigma_{x}$ is $r$ and can be considerably smaller than $L$ , the covariance matrix of $y$ , given by $\Sigma_{y}:=\mathbb{E}\left[yy^{*}\right]$ , typically admits a rank much larger than $r$ , even if no noise is present ( $\sigma=0$ ). This is because the rank of $\Sigma_{y}$ is dominated by the dimension of the set of vectors $\left\{{R}_{s}\{v_{i}\}\right\}_{i,s}$ for $i=1,\ldots,r$ and $s\in S$ (where $S\subset\mathbb{Z}_{L}$ is a set of allowed shifts), which can exceed $r$ significantly. In particular, if the probability distribution of $s$ is non-vanishing (i.e. all shifts are allowed; $S=\mathbb{Z}_{L}$ ) and $v_{i}$ is aperiodic for some $i$ (see [1]), then $\Sigma_{y}$ is full-rank.

Considering the setting of (1)–(3), a fundamental problem of interest is the following one.

Problem 1 (Multi-Reference Factor Analysis).

Given $N$ i.i.d measurements $y_{1},\ldots,y_{N}$ following the model (1)–(3), estimate $\lambda_{1},\ldots,\lambda_{r}$ and $v_{1},\ldots,v_{r}$ .

Problem 1, termed Multi-Reference Factor Analysis (MRFA), can be viewed as a generalization of standard factor analysis, to the setting of unknown translations of the underlying signal vectors. In this work, instead of estimating $\lambda_{1},\ldots,\lambda_{r}$ and $v_{1},\ldots,v_{r}$ directly, we consider the closely-related problem of estimating the covariance matrix $\Sigma_{x}$ , whose eigenvalues and eigenvectors are $\{\lambda_{i}\}_{i=1}^{r}$ and $\{v_{i}\}_{i=1}^{r}$ , respectively. We exemplify our setting in Figure 1 for $r=3$ , $L=50$ , and $\sigma^{2}=0.01/L$ .

1.2 Related work

As far as we know, the MRFA problem (Problem 1) as presented here has not been treated in the literature. However, it is worthwhile to review some closely related problems, and the approaches undertaken to solve them. When the signal $x$ in (1) is deterministic and fixed, our setting becomes that of Multi-Reference Alignment (MRA) [2, 8, 9, 32, 12, 6], where an unknown vector is to be recovered from its translated and noisy observations. Since $x$ is fixed in MRA, its translated copies can be accurately aligned when the Signal-to-Noise Ratio (SNR) is sufficiently high. Therefore, the line of work in MRA mostly focuses on the low-SNR regime, where alignment is impossible. Signal estimation under group actions other than translations has recently been considered in [7, 3].

In the majority of the above-mentioned works on MRA, signal estimation is carried out through an instance of the Method of Moments [37] (see also the generalized method of moments [19]). Specifically, certain shift-invariant moments of the underlying signal are identified and estimated, and then employed for recovering the underlying signal from the emerging moment equations. Shift-invariant moments which are extensively used in this context are the signal’s mean, the signal’s power spectrum, and the signal’s bispectrum [13]. The power spectrum and the bispectrum of a signal essentially correspond to certain double and triple correlations, respectively, in the Fourier domain of the signal, which are invariant to cyclic shifts of the signal. It is important to point out that the power spectrum and the bispectrum are omnipresent in classical statistical signal processing, see for example [31], with an endless list of applications. In the context of MRA, knowing the bispectrum is sufficient to recover the signal up to fundamental ambiguities (arbitrary cyclic shifts of the underlying signal) [9]. The algorithmic use of the bispectrum (often coupled with other shift-invariant moments for improved performance) for estimating the signal in MRA was investigated in [9, 32], where different algorithms were presented for this task. Particularly, much focus was put on the required sample complexity, namely the number of samples $N$ required to achieve a prescribed estimation error for a given SNR value. We also mention [1], where it was shown that the second moment of the measurements $y$ in MRA is sufficient to recover the signal $x$ if the distribution of the shifts is aperiodic, resulting in an improved sample complexity rate. Aside from approaches leveraging shift-invariant moments, another widespread approach for solving estimation problems akin to MRA (or more generally - estimation problems involving nuisance parameters) is Expectation Maximization (EM) [16]. While EM is popular and intuitive, its crucial drawback is lack of theoretical convergence guarantees. On the other hand, methods based on shift-invariant moments are amenable to rigorous analysis, allowing for algorithms with provable theoretical guarantees. Furthermore, methods based on invariant moments lead to single-pass algorithms, in the sense that every measurement $y_{i}$ is only considered once, with subsequent processing involving only the estimated moments.

We also mention that several recent works extend the standard model of MRA (where $x$ is fixed) to a scenario where $x$ assumes discrete heterogeneity [10, 29, 32], namely that $x$ is chosen from a finite set of templates $x_{1},\ldots,x_{k}$ .

Another closely related line of work is that of Generalized Principal Components Analysis (GPCA) (also known as subspace clustering) [39, 38, 27], where data points are assumed to reside on a union of low-dimensional linear subspaces. The MRFA problem could be considered as a special case of GPCA, where each subspace is described by a different translation of the subspace spanned by $\{v_{i}\}_{i=1}^{r}$ . However, in GPCA no relation between the different subspaces is assumed, and consequently, the theoretical recovery guarantees for subspace recovery are typically prohibitive if the noise variance and the number of subspaces are large.

Last, we mention [5], where the authors considered the problem of MRFA in the restricted case of $r=1$ . A key observation in [5] is that if the distribution of the shifts is uniform, then the bispectrum vanishes entirely. Therefore, it was proposed to employ the fourth-order shift-invariant moment known as the trispectrum [13] to estimate the signal parameters ( $\lambda_{1}$ and $v_{1}$ ), and algorithms were presented for consistently estimating $\lambda_{1}$ and $v_{1}$ as $N\rightarrow\infty$ . The trispectrum is analogous to the power spectrum and the bispectrum, in the sense that it consists of certain quadruple correlations in the Fourier domain of the signal, which are invariant to cyclic shifts. As in MRA, much focus was put in [5] on the sample complexity of the algorithms in the low-SNR regime, since in the high-SNR regime the observations can be accurately aligned using correlations (a fact which is only true for the case $r=1$ ).

1.3 Our contributions and main results

We consider the problem of estimating the covariance matrix $\Sigma_{x}$ from the measurements $y_{1},\ldots,y_{N}$ (generated according to the model (1)–(3)) using shift-invariant moments, where we assume no explicit knowledge of the rank $r$ . We investigate certain theoretical aspects of uniqueness and identifiability, derive practical algorithms for consistently estimating $\Sigma_{x}$ from $y_{1},\ldots,y_{N}$ , and conclude with extensive simulations.

We next describe our contributions in detail. Unless otherwise stated, we refer the complex-valued case of our setting, where $a_{i}\sim\mathbb{C}\mathcal{N}(0,\lambda_{i})$ and $\eta\sim\mathbb{C}\mathcal{N}(0,\sigma^{2}I_{L})$ .

1.3.1 Characterization of uniqueness and identifiability

We begin by investigating the moments equations arising from the power spectrum and the trispectrum, and consider the question of whether these equations are sufficient to recover $\Sigma_{x}$ , and under which conditions. The results of this investigation are reported in Section 2, where we show the following. By posing an equivalent formulation to our problem in the Fourier domain, replacing $\Sigma_{x}$ with its analogue in the Fourier domain $\hat{\Sigma}_{x}$ , we show that the algebraic structure of the moments equations (arising from the power spectrum and the trispectrum) determines $\hat{\Sigma}_{x}$ completely up to multiplication (element-wise) with an unknown circulant matrix of phases (namely a circulant matrix with unit-magnitude elements). Essentially, this multiplication with a circulant matrix of phases corresponds to phase uncertainties on the diagonals of $\hat{\Sigma}_{x}$ . We then consider the problem of resolving these uncertainties, a problem which we term circulant phase retrieval (see Problem 2). We show that by leveraging the Hermitian and positive semidefinite (PSD) structure of $\hat{\Sigma}_{x}$ , it is possible to solve the circulant phase retrieval problem and to recover $\hat{\Sigma}_{x}$ (up to certain fundamental ambiguities, see Proposition 3), whenever $r<\sqrt{L}$ and a certain technical condition holds (Condition 10 in Section 3.2). We note that this technical condition was observed to hold in all conducted numerical experiments, suggesting that it is non-restrictive in practice. While our main result asserts that recovery of a low-rank $\Sigma_{x}$ is possible from the power spectrum and the trispectrum alone (see Theorem 5), we show that this not the case for a full-rank $\Sigma_{x}$ (see Proposition 6).

1.3.2 Practical estimation procedures with theoretical guarantees

In Section 3, we describe a statistically-consistent procedure for estimating $\Sigma_{x}$ from finite-sample estimates of the power spectrum and the trispectrum. This is essentially a two step procedure, where the first step is to estimate $\hat{\Sigma}_{x}$ up to the aforementioned “diagonal phase ambiguities”, and the second step is to resolve them while exploiting the fact that $\hat{\Sigma}_{x}$ is Hermitian and PSD. The first step is derived in Section 3.1, see Algorithm 1, and consists of solving a convex optimization problem, followed by computing several rank-one decompositions. We prove that regardless of the rank $r$ , this procedure consistently estimates $\hat{\Sigma}_{x}$ as $N\rightarrow\infty$ up to an element-wise multiplication with a circulant phase matrix, see Theorem 7. We also provide an appropriate modification of the above-mentioned procedure to handle the real-valued case ( $v_{i}\in\mathbb{R}^{L}$ , $a_{i}\sim\mathcal{N}(0,\lambda_{i})$ , and $\eta\sim\mathcal{N}(0,\sigma^{2}I_{L})$ ), where the difference lies only in the objective function of the convex optimization problem. We remark that only the first step of our two-step procedure needs to be modified to handle the real-valued case. The second step of the recovery process, i.e. resolving the diagonal ambiguities, is derived in Section 3.2. Our main contribution in this context, is a polynomial-time procedure for solving the circulant phase retrieval problem (Problem 2 in Section 2) when $r<\sqrt{L}$ and Condition 10 holds, see Algorithm 2 in Section 3.2. Fundamentally, this procedure begins by constructing a certain matrix from the output of step one, and proceeds with evaluating its right singular vector corresponding to its smallest singular value. When combined with Algorithm 1, Algorithm 2 allows for a consistent estimate of $\Sigma_{x}$ as $N\rightarrow\infty$ , see Theorem 13. In Figure 2 we demonstrate the results of applying Algorithms 1 and 2 to estimate $\{\lambda_{i}\}$ and $\{v_{i}\}$ of Figure 1, for $N=1000$ . Last, in Section 4 we conduct extensive numerical experiments, corroborating the statistical consistency of our estimators and moreover, demonstrating their favorable properties, such as a $1/N$ squared-error convergence rate and robustness to high levels of noise.

2 Invariant moments and identifiability of $\Sigma_{x}$

We start by introducing the shift-invariant moments used to recover $\Sigma_{x}$ , and provide certain necessary and sufficient conditions for recovery. Instead of working with the model (1) directly, we consider a more convenient and equivalent formulation in the Fourier domain, where cyclic shifts are replaced by modulations. Let $F\in\mathbb{C}^{L\times L}$ be the unitary Discrete Fourier Transform (DFT) matrix, and let $f_{\ell}\in\mathbb{C}^{L}$ be the $\ell$ ’th DFT vector, given by

[TABLE]

We denote the Fourier transforms of the quantities in (1) and (3) by

[TABLE]

Then, a formulation equivalent to (1)–(3) in the Fourier domain is

[TABLE]

for $k=0,\ldots,L-1$ , where $\hat{\eta}\sim\mathbb{C}\mathcal{N}(0,\sigma^{2}I_{L})$ , $\{\hat{v}_{i}\}_{i=1}^{r}$ are orthonormal (since $F$ is unitary), and $s$ is the parameter of the cyclic shift from (1). Correspondingly, the covariance matrix of $\hat{x}$ is given by

[TABLE]

which is Hermitian and positive semidefinite (PSD), with rank $r$ and with eigenvalues and eigenvectors $\{\lambda_{i}\}_{i=1}^{r}$ and $\{\hat{v}_{i}\}_{i=1}^{r}$ , respectively. Clearly, knowing $\hat{\Sigma}_{x}$ is equivalent to knowing $\Sigma_{x}$ , and so from this point onward we focus on the recovery of $\hat{\Sigma}_{x}$ instead of $\Sigma_{x}$ . Throughout this section, we assume the noiseless case, i.e. $\sigma=0$ , as the existence of noise simply adds a known bias term to $\Sigma_{x}$ which is easily removed (see Section 3.1 and Appendix E), and has no influence on the issue of identifiability of the solution.

Let us consider the second and fourth moments of $\hat{y}$ , denoted $M^{(2)}_{\hat{y}}\in\mathbb{C}^{L\times L}$ and $M^{(4)}_{\hat{y}}\in\mathbb{C}^{L\times L\times L\times L}$ respectively, and given by

[TABLE]

for $k_{1},k_{2},k_{3},k_{3}\in\{0,\ldots,L-1\}$ , where $\overline{(\cdot)}$ denotes complex-conjugation. It is important to mention that all odd-ordered moments of $\hat{y}$ vanish, since the $a_{i}$ ’s of (3) admit a zero-centered symmetric distribution. This explains why we only consider the second and fourth moments of $\hat{y}$ , and not the first and third. Next, we define the following subsets of $M^{(2)}_{\hat{y}}$ and $M^{(4)}_{\hat{y}}$ :

[TABLE]

$P_{y}$ is known as the power spectrum of $y$ , and $T_{y}$ is known as the trispectrum of $y$ [13]. A fundamental property of $P_{y}$ and $T_{y}$ is that their entries are invariant to cyclic shifts of $y$ (or equivalently, to integer modulations of $\hat{y}$ ), regardless of the distribution of the cyclic shifts (this can be easily verified by substituting $\hat{y}=f_{s}[k]\hat{x}[k]$ into (10) and (11), where $f_{s}[k]=e^{-\imath 2\pi ks/L}$ from (5)). Moreover, if $y$ admits uniformly distributed cyclic shifts, then most of the entries in $M^{(2)}_{\hat{y}}$ and $M^{(4)}_{\hat{y}}$ vanish, and the only non-zero entries of $M^{(2)}_{\hat{y}}$ and $M^{(4)}_{\hat{y}}$ are given by $P_{y}$ and $T_{y}$ , respectively.

Since $P_{y}$ and $T_{y}$ are invariant to cyclic shifts in $y$ , and as ${y}=R_{s}\{x\}$ (see (1) with $\sigma=0$ ), $P_{y}$ and $T_{y}$ can be viewed as computed directly from $M^{(2)}_{\hat{x}}$ and $M^{(4)}_{\hat{x}}$ (defined by replacing $\hat{y}$ with $\hat{x}$ in (9)) instead of $M^{(2)}_{\hat{y}}$ and $M^{(4)}_{\hat{y}}$ , respectively. Furthermore, as $a_{i}$ is normally-distributed, all moments of $\hat{x}$ can be described in terms of its first and second moments, that is, its mean and covariance. Since the mean of $\hat{x}$ is zero, $P_{y}$ and $T_{y}$ can be described solely in terms of $\hat{\Sigma}_{x}$ . In particular, we have the following proposition providing the explicit forms of ${P}_{y}$ and ${T}_{y}$ .

Proposition 1 (Explicit form of ${P}_{y}$ and ${T}_{y}$ ).

Consider the noiseless case (i.e. $\sigma=0$ ). Then,

[TABLE]

for all $k_{1},k_{2},k_{3}\in\{0,\ldots,L-1\}$ .

The proof is provided in Appendix A.

Assuming we have access to the shift-invariant moments $P_{y}$ and $T_{y}$ , noting that they can be estimated in a straightforward manner from the observations of $y$ (see (22) and (23) in Section 3.1), we turn to address the question of whether $\hat{\Sigma}_{x}$ can be identified, and under which conditions, from $P_{y}$ and $T_{y}$ . To that end, we consider the set of equations

[TABLE]

where $X\in\mathbb{C}^{L\times L}$ represents the unknown covariance matrix, and $\mathcal{P}:\mathbb{C}^{L\times L}\rightarrow\mathbb{C}^{L}$ , $\mathcal{T}:\mathbb{C}^{L\times L}\rightarrow\mathbb{C}^{L\times L\times L}$ are maps encoding the relation between the underlying covariance $\hat{\Sigma}_{x}$ and the observed shift-invariant moments $P_{y}$ and $T_{y}$ . Specifically, and in accordance with (12) and (13), we define $\mathcal{P}$ and $\mathcal{T}$ as

[TABLE]

for all $k_{1},k_{2},k_{3}\in\{0,\ldots,L-1\}$ . We mention that the domains of $\mathcal{P}$ and $\mathcal{T}$ are arbitrary $\mathbb{C}^{L\times L}$ matrices (instead of only Hermitian and PSD matrices) to simplify the analysis in this section. This simplification is achieved by first characterizing the solutions to (14) for arbitrary $\mathbb{C}^{L\times L}$ matrices, and then restricting our attention to the subset of solutions which are Hermitian and PSD.

Given $P_{y}$ and $T_{y}$ , (14) corresponds to a set of non-linear equations which are to be solved to determine $\hat{\Sigma}_{x}$ , where $\mathcal{P}(X)$ is a linear map in $X$ , and $\mathcal{T}(X)$ is a quadratic map in $X$ . According to (12), $P_{y}$ is merely the main diagonal of $\hat{\Sigma}_{x}$ , and hence insufficient for recovering $\hat{\Sigma}_{x}$ . However, $T_{y}$ provides additional $L^{3}$ equations, which is more than the number of variables in a generic covariance matrix, and hence possibly enough for recovering $\hat{\Sigma}_{x}$ . Let us denote by $\operatorname{Circulant}\{z\}$ a circulant matrix constructed from a vector $z\in\mathbb{C}^{L}$ , namely

[TABLE]

for $k_{1},k_{2}=0,\ldots,L-1$ . The following lemma characterizes the set of all $\mathbb{C}^{L\times L}$ matrices satisfying the equations (14).

Lemma 2 (Solutions of the moments equations).

Suppose that $\hat{\Sigma}_{x}[i,j]\neq 0$ for all $i,j$ . Then, a matrix $X\in\mathbb{C}^{L\times L}$ satisfies the set of equations (14) if and only if

[TABLE]

where $\varphi_{1},\ldots,\varphi_{L-1}\in[0,2\pi)$ but are otherwise arbitrary, and $\odot$ is the Hadamard product (entry-wise multiplication).

The proof of Lemma 2 is provided in Appendix B. Essentially, Lemma 2 asserts that a solution to (14) is equal to the true covariance $\hat{\Sigma}_{x}$ up to a circulant matrix of unknown phases. That is, each diagonal of $\hat{\Sigma}_{x}$ with circulant wrapping (i.e. the entries $\{\hat{\Sigma}_{x}[\ell,\operatorname{mod}(\ell+k,L)]\}_{k=0}^{L-1}$ for the $\ell$ ’th diagonal) in (18) is multiplied by an unknown phase factor of the form $e^{\imath\varphi}$ . In accordance with Lemma 2, in Section 3.1 we describe a procedure, based on convex optimization followed by a rank-one decomposition, which solves (14) and outputs a statistically consistent estimate (as $N\rightarrow\infty$ ) for a matrix $\hat{\Sigma}_{x}\odot\operatorname{Circulant}\left\{[1,e^{\imath\varphi_{1}},\ldots,e^{\imath\varphi_{L-1}}]\right\}$ with unknown angles $\varphi_{1},\ldots,\varphi_{L-1}$ . See Algorithm 1 for a summary of the procedure, and Theorem 7 for its consistency guarantee.

At this point, it is important to note that the problem of recovering $\hat{\Sigma}_{x}$ under the model (7) admits an inherent ambiguity. Clearly, cyclically shifting the signal $x$ results in a covariance matrix $\Sigma_{x}$ whose rows and columns are cyclically shifted, while $P_{y}$ and $T_{y}$ remain unchanged. In the Fourier domain, where $x$ is replaced with $\hat{x}$ , this ambiguity corresponds to integer modulations of the rows and columns of $\hat{\Sigma}_{x}$ . In particular, let $\Omega(\hat{\Sigma}_{x})$ be a set of $L$ matrices given by

[TABLE]

where $f_{\ell}$ is the $\ell$ ’th DFT vector defined in (5). The set $\Omega(\hat{\Sigma}_{x})$ is a set of ambiguities associated with the MRFA problem, since each $X\in\Omega(\hat{\Sigma}_{x})$ is the covariance matrix of $F\cdot(R_{s}\{x\})$ (i.e. the Fourier transform of the signal $x$ cyclically shifted by $s$ ) for some cyclic shift $s$ (see (1)–(3) and (5)–(7)). Consequently, $\Omega(\hat{\Sigma}_{x})$ is a set of ambiguities inherent in the recovery of $\hat{\Sigma}_{x}$ from the shift-invariant moments $P_{y}$ and $T_{y}$ , as established by the following proposition.

Proposition 3 (Fundamental ambiguities).

Every matrix $X\in\Omega(\hat{\Sigma}_{x})$ is Hermitian, PSD, has rank $r$ , and satisfies the equations in (14).

Proof.

The matrices $\operatorname{diag}(f_{\ell})\cdot\hat{\Sigma}_{x}\cdot\operatorname{diag}(f_{\ell}^{*})$ (for every $\ell\in\{0,\ldots,L-1\}$ ) are Hermitian since $\hat{\Sigma}_{x}$ is Hermitian. They are also PSD with rank $r$ since they are similar to $\hat{\Sigma}_{x}$ (and hence share their eigenvalues with $\hat{\Sigma}_{x}$ ). Last, observe that

[TABLE]

and hence Lemma 2 establishes that the matrix $\operatorname{diag}(f_{\ell})\cdot\hat{\Sigma}_{x}\cdot\operatorname{diag}(f_{\ell}^{*})$ satisfies the equations in (14). ∎

Henceforth, whenever we refer to the recovery of $\hat{\Sigma}_{x}$ , we essentially mean the recovery of any (arbitrary) element from the set $\Omega(\hat{\Sigma}_{x})$ .

Evidently, the set of equations (14) goes a long way in narrowing down the set of feasible covariance matrices, as solving (14) leaves us with only $L-1$ unknown parameters $\varphi_{1},\ldots,\varphi_{L-1}\in[0,2\pi)$ , which are to be determined in order to recover $\hat{\Sigma}_{x}$ . This leads us to consider the following problem.

Problem 2 (Circulant phase retrieval).

Given $X=\hat{\Sigma}_{x}\odot\operatorname{Circulant}\left\{[1,e^{\imath\varphi_{1}},\ldots,e^{\imath\varphi_{L-1}}]\right\}$ with unknown angles $\varphi_{1},\ldots,\varphi_{L-1}\in[0,2\pi)$ , determine $\hat{\Sigma}_{x}$ (or any arbitrary element from $\Omega(\hat{\Sigma}_{x})$ of (19)).

In a way, Problem 2 can be viewed as a certain phase retrieval problem, where the phases multiplying each diagonal of $\hat{\Sigma}_{x}$ (with circulant wrapping) are to be retrieved, hence the name “circulant phase retrieval”. In this regard, note that Lemma 2 considers a general matrix $X\in\mathbb{C}^{L\times L}$ , and ignores the fact that we actually seek a matrix which is Hermitian and PSD, which are properties satisfied by the true covariance matrix $\hat{\Sigma}_{x}$ . Without any further prior knowledge on $\hat{\Sigma}_{x}$ (not even its rank), a natural way to go about solving Problem 2 is to try to solve the following surrogate problem.

Problem 3.

Given $X=\hat{\Sigma}_{x}\odot\operatorname{Circulant}\left\{[1,e^{\imath\varphi_{1}},\ldots,e^{\imath\varphi_{L-1}}]\right\}$ with unknown angles $\varphi_{1},\ldots,\varphi_{L-1}\in[0,2\pi)$ , find angles $\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}\in[0,2\pi)$ such that $\widetilde{X}:=X\odot\operatorname{Circulant}\{[1,e^{-\imath\widetilde{\varphi}_{1}},\ldots,e^{-\imath\widetilde{\varphi}_{L-1}}]\}$ is Hermitian and PSD.

Suppose that we are able to solve Problem 3, then a fundamental question is whether any $\widetilde{X}$ solving Problem 3 is also a solution to Problem 2, i.e. whether $\widetilde{X}$ is in the set of feasible solutions $\Omega(\hat{\Sigma}_{x})$ . It turns out that for certain $\hat{\Sigma}_{x}$ which are sufficiently low-rank, any $\widetilde{X}$ solving Problem 3 is indeed also a solution to Problem 2. In particular, we establish that this is true if $r=1$ under mild conditions on $\hat{\Sigma}_{x}$ , or if $1<r<\sqrt{L}$ and $\hat{\Sigma}_{x}$ satisfies a certain technical condition (Condition 10 in Section 3.2). These results are summarized by the next Lemma.

Lemma 4.

Suppose that $\hat{\Sigma}_{x}[i,j]\neq 0$ for all $i,j$ , and either $r=1$ , or $1<r<\sqrt{L}$ and Condition 10 holds. Then, if $\widetilde{X}$ is a solution to Problem 3 (i.e. $\widetilde{X}=X\odot\operatorname{Circulant}\{[1,e^{-\imath\widetilde{\varphi}_{1}},\ldots,e^{-\imath\widetilde{\varphi}_{L-1}}]\}$ is Hermitian and PSD, where $X$ is as described in Problem 3) it is also a solution to Problem 2 (i.e. $\widetilde{X}\in\Omega(\hat{\Sigma}_{x})$ ).

The proof of Lemma 4 is provided in Appendix C. In Section 3.2 we outline a polynomial-time procedure for solving Problem 3, which is guaranteed to succeed if $r<\sqrt{L}$ and Condition 10 holds, see Algorithm 2 (we note that even though Condition 10 is not required for the claim of Lemma 4 in the case of $r=1$ , we do require it for our guarantees on the success of Algorithm 2). We mention that Condition 10 arises naturally from the procedure described in Section 3.2. While this condition is very technical and somewhat opaque, it can be easily tested for any $\Sigma_{x}$ using the singular values of a certain matrix whose construction is detailed in Section 3.2. Moreover, Condition 10 was observed to hold in all numerical experiments conducted in Section 4.

The following theorem is the main result concerning the recovery of low-rank covariance matrices from $P_{y}$ and $T_{y}$ .

Theorem 5 (Low-rank recovery of $\hat{\Sigma}_{x}$ ).

Suppose that $\hat{\Sigma}_{x}[i,j]\neq 0$ for all $i,j$ . If $r=1$ , or if $1<r<\sqrt{L}$ and Condition 10 holds, then $\hat{\Sigma}_{x}$ (or any arbitrary element from $\Omega(\hat{\Sigma}_{x})$ ) can be recovered from $P_{y}$ and $T_{y}$ . Specifically, if $X$ is Hermitian, PSD, and satisfies equations (14), then $X\in\Omega(\hat{\Sigma}_{x})$ .

The proof of Theorem 5 follows immediately from combining Lemma 2 with Lemma 4. Evidently, Theorem 5 together with Proposition 3 assert that a matrix $X$ is Hermitian, PSD, and satisfies equations (14) if and only if $X\in\Omega(\hat{\Sigma}_{x})$ .

Coupling the procedure for estimating $\hat{\Sigma}_{x}$ up to diagonal phase ambiguities (Algorithm 1) with the procedure for resolving them (Algorithm 2), we obtain a statistically consistent procedure for recovering $\hat{\Sigma}_{x}$ (see Theorem 13), which has polynomial-time complexity.

Now, while low-rank covariance matrices can be successfully recovered from $P_{y}$ and $T_{y}$ , this is not the case for full-rank $\hat{\Sigma}_{x}$ , as shown by the following proposition.

Proposition 6 (Full-rank $\hat{\Sigma}_{x}$ ).

Suppose that $\hat{\Sigma}_{x}[i,j]\neq 0$ for all $i,j$ . If $r=L$ , then $\hat{\Sigma}_{x}$ cannot be recovered from only $P_{y}$ and $T_{y}$ . That is, there exists a Hermitian and positive definite matrix $X$ , with $X\notin\Omega(\hat{\Sigma}_{x})$ , which satisfies equations (14).

The proof of Proposition 6 is provided in Appendix D.

3 Recovering the covariance matrix $\Sigma_{x}$

In this section we describe our algorithms for estimating a low-rank $\hat{\Sigma}_{x}$ (and consequently the covariance matrix of $x$ , i.e. $\Sigma_{x}$ ) using $N$ observations $y_{1},\ldots,y_{N}$ from the model (1)–(3), with an arbitrary noise variance $\sigma^{2}$ . We also provide appropriate statistical consistency guarantees for these algorithms.

3.1 Step 1: Recovering $\hat{\Sigma}_{x}$ up to diagonal phase ambiguities

Given $N$ observations $y_{1},\ldots,y_{N}$ drawn from the model (1)–(3), we first compute their Fourier transforms

[TABLE]

where $F$ is the DFT matrix from (5). Next, we estimate $P_{y}$ and $T_{y}$ via

[TABLE]

for $k_{1},k_{2},k_{3}=0,\ldots,L-1$ . Evidently, $\widetilde{P}_{y}$ and $\widetilde{T}_{y}$ are unbiased and consistent estimators for $P_{y}$ and $T_{y}$ , respectively [40]. We then proceed by constructing estimators for the diagonals of $\hat{\Sigma}_{x}$ using $\widetilde{P}_{y}$ and $\widetilde{T}_{y}$ (as shown below), and prove that they are statistically consistent as $N\rightarrow\infty$ up to arbitrary phase factors (i.e., multiplicative constants with unit magnitude).

We now describe our estimation procedure in detail. Let us denote by $d_{m}\in\mathbb{C}^{L}$ a column vector given by the $m$ ’th diagonal (with circulant wrapping) of the matrix $\hat{\Sigma}_{x}+\sigma^{2}I_{L}$ , i.e.

[TABLE]

for $m,k\in\{0,\ldots,L-1\}$ . In Appendix E we show that $P_{y}$ and $T_{y}$ can be expressed in terms of $d_{0},\ldots,d_{L-1}$ as

[TABLE]

for every $k,k_{1},k_{2},m\in\{0,\ldots,L-1\}$ . Next, we define the matrices $G_{m}\in\mathbb{C}^{L\times L}$ , $m=0,\ldots,L-1$ , by

[TABLE]

noting that $G_{m}\succeq 0$ with $\operatorname{rank}\{G_{m}\}=1$ , and rewrite (26) using (27) as

[TABLE]

We next estimate the matrices $G_{m}$ , for $m\geq 1$ , by solving an optimization problem that fits $\widetilde{T}_{y}$ from (23) to the form (28) (replacing $T_{y}$ with its estimate $\widetilde{T}_{y}$ ), while removing the rank constraint on $G_{m}$ . Specifically, we solve

[TABLE]

which is a linear least-squares problem with semidefinite constraints, hence a convex optimization problem readily solved by a variety of algorithms [18, 11]. Even though we omitted the rank constraint on $G_{m}$ when solving (29), the resulting estimates $\widetilde{G}_{m}$ approximate the matrices $G_{m}$ (as established in the proof of Theorem 7 below) and are close to being rank-one, as exemplified in Figure 3 for $L=10$ , $r=3$ , $N=10^{4}$ , and $\sigma^{2}=0.05$ .

Then, each diagonal $d_{m}$ for $m\geq 1$ is estimated from the best rank-one approximation to $\widetilde{G}_{m}$ . In particular, if $\widetilde{\mu}_{1}^{(m)}$ is the largest eigenvalue of $\widetilde{G}_{m}$ and $\widetilde{u}_{1}^{(m)}$ is its corresponding eigenvector, then we estimate $d_{m}$ via

[TABLE]

noting that $\widetilde{d}_{m}$ is unique up to a phase factor, i.e. a constant $e^{\imath\varphi_{m}}$ multiplying $d_{m}$ , where $\varphi_{m}\in[0,2\pi)$ is an (unknown) angle. Note that according to (25), the main diagonal of $\hat{\Sigma}_{x}$ , namely $d_{0}$ , can be estimated directly from $\widetilde{P}_{y}$ and hence does not suffer from this phase ambiguity. We therefore proceed by forming the matrix $\widetilde{C}_{x}\in\mathbb{C}^{L\times L}$ , given by

[TABLE]

where the subtraction of $\sigma^{2}$ from the main diagonal of $\widetilde{C}_{x}$ corrects for the bias due to noise. The following theorem establishes that $\widetilde{C}_{x}$ is a statistically-consistent estimate of $\hat{\Sigma}_{x}$ as $N\rightarrow\infty$ , up to an unknown circulant phase matrix.

Theorem 7 (Consistency of (31)).

Suppose that $\hat{\Sigma}_{x}[i,j]\neq 0$ for all $i,j$ . Then,

[TABLE]

The proof of Theorem 7 is provided in Appendix G.

Now, from a practical standpoint, it is worthwhile to briefly consider the real-valued case, where $v_{i}\in\mathbb{R}^{L}$ , $a_{i}\sim\mathcal{N}(0,\lambda_{i})$ , and $\eta\sim\mathcal{N}(0,\sigma^{2}I_{L})$ . The only difference between the real-valued case and the complex-valued case lies in the expression for the trispectrum $T_{y}$ , which now admits an additional additive term. In particular, we show in Appendix F that instead of (28) we have

[TABLE]

Therefore, analogously to (29), we propose to solve

[TABLE]

and proceed with the estimation of $\widetilde{d}_{m}$ and the construction of $\widetilde{C}_{x}$ as in the complex-valued case. Due to the additional term in the expression for the trispectrum in (33), the proof of the analogue of Theorem 7 in the real-valued case is somewhat more complicated, and is left for a future work. Nonetheless, we demonstrate by numerical experiments in Section 4 that the proposed approach for the real-valued case provides results that are very similar to the complex-valued case.

The algorithm for recovering $\hat{\Sigma}_{x}$ up to unknown diagonal phase ambiguities, for both the complex-valued and the real-valued cases, is described in Algorithm 1.

Remark 1.

It is worthwhile to point out that problem (29) is ill-posed without the semidefinite constraints. Removing the semidefinite constraints in (29) results in a linear least-squares system with $L^{3}$ equations, and a smaller number of $L^{3}-L^{2}$ variables (due to the constraint $G^{{}^{\prime}}_{0}=\widetilde{P}_{y}\cdot\widetilde{P}_{y}^{T}$ ), which is not underdetermined. Yet, we observe that for every triplet of indices $(k_{1},k_{2},m)$ there exists another triplet $(k_{1},k_{1}+m,k_{2}-k_{1})$ which results in exactly the same equation as for the first triplet (since the terms $G^{{}^{\prime}}_{k_{2}-k_{1}}[k_{1},k_{1}+m]$ and $G^{{}^{\prime}}_{m}[k_{1},k_{2}]$ in (29) interchange). Therefore, the number of independent equations is actually smaller than the number of variables and the problem is ill-posed. Yet, it turns out that the semidefinite constraints resolve this ill-posedness, as established in the proof of Theorem 7.

Remark 2.

We mention that the trispectrum $T_{y}$ admits several symmetries which can be exploited to reduce the computational burden of Algorithm 1. Notice from (11) that swapping the first and third, or second and fourth indices of $M^{(4)}_{\hat{y}}$ does not change the value of $M^{(4)}_{\hat{y}}$ . Therefore, it is clear that

[TABLE]

hence it is sufficient to estimate only about a quarter of the elements of $T_{y}$ .

3.2 Step 2: Resolving the diagonal phase ambiguities

We consider an estimator for $\hat{\Sigma}_{x}$ of the form

[TABLE]

where $\widetilde{C}_{x}$ is from (31). In this section, we derive a procedure to find the angles $\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}\in[0,2\pi)$ such that $\widetilde{\hat{\Sigma}}_{x}$ is close to being Hermitian and PSD. For simplicity of presentation, we derive the procedure in the limiting case of $N\rightarrow\infty$ . Specifically, we consider the setting of Problem 3, where we assume that we have access to the matrix $X\in\mathbb{C}^{L\times L}$ , given by

[TABLE]

with unknown angles $\varphi_{1},\ldots,\varphi_{L-1}$ , and seek angles $\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}$ such that the matrix

[TABLE]

is Hermitian and PSD.

Let us define the matrices ${H}_{i,j}\in\mathbb{C}^{L\times L}$ , for $i,j=0,\ldots,L-1$ , by

[TABLE]

where $R_{i,j}\{X\}$ is the operation of cyclically shifting the rows and columns of $X$ by $i$ and $j$ , respectively, namely

[TABLE]

The following lemma summarizes several properties of $H_{i,i}$ required for our derivation.

Lemma 8.

The matrix $H_{i,i}$ (taking $j=i$ in (39)) is given explicitly by

[TABLE]

and is Hermitian, PSD, and satisfies

[TABLE]

for every $i=0,\ldots,L-1$ .

The proof of Lemma 8 is provided in Appendix H. Next, using (38), we define the matrix ${S}\in\mathbb{C}^{L^{2}\times L^{2}}$ via its $L\times L$ blocks $S^{(i,j)}$ as

[TABLE]

where $S^{(i,j)}\in\mathbb{C}^{L\times L}$ denotes the $(i,j)$ ’th $L\times L$ block of ${S}$ , and

[TABLE]

We have the following lemma regarding the matrix ${S}$ of (43).

Lemma 9.

If $\widetilde{X}$ of (38) is Hermitian and PSD, then ${S}$ of (43) is also Hermitian and PSD.

The proof of Lemma 9 is provided in Appendix I. From Lemma 9 it follows that for appropriate angles $\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}$ such that $\widetilde{X}$ is Hermitian and PSD, ${S}$ is also Hermitian and PSD, and we can write

[TABLE]

for some $K\in\mathbb{C}^{L^{2}\times L^{2}}$ , where ${K}_{i}$ denotes the $i$ ’th $L\times L^{2}$ block of $K$ (the $L$ consecutive rows of ${K}$ starting from row number $(i-1)L+1$ ). From (45) and (43), we have that $H_{i,i}=K_{i}K_{i}^{*}$ , which implies that the columns of ${K}_{i}$ are spanned by the eigenvectors of ${H}_{i,i}$ . Following (42), we define ${V}^{(i)}\in\mathbb{C}^{L\times r^{2}}$ to be the matrix whose columns are the $r^{2}$ eigenvectors of ${H}_{i,i}$ which correspond to its largest eigenvalues. Then, we can write

[TABLE]

where $A_{i}\in\mathbb{C}^{r^{2}\times L^{2}}$ is a matrix of unknown coefficients. Now, using (46), (45), and (43) we have that

[TABLE]

where we defined $A_{i}A_{j}^{*}=B_{i,j}\in\mathbb{C}^{r^{2}\times r^{2}}$ to be a matrix of $r^{4}$ unknown coefficients. Importantly, fixing $i$ and $j$ , (47) describes a system of linear equations in the $r^{4}$ variables $\{B_{i,j}\}_{i,j=1}^{r^{2}}$ and the $L$ variables $\beta_{0}^{(j-i)},\beta_{1}^{(j-i)},\ldots,\beta_{L-1}^{(j-i)}\in\mathbb{C}$ , where we relaxed the requirement that $\beta_{m}^{(j-i)}$ have unit norm (we will see that this relaxation still enables us to obtain the correct angles $\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}$ ). Recall that the matrices $H_{i,j}$ and $V^{(i)}$ are computed from the matrix $X$ , which is provided to us (or estimated from the data, e.g. $\widetilde{C}_{x}$ from Section 3.1). Hence, in total, (47) describes a linear system with $L^{2}$ equations in $r^{4}+L$ variables, among which the $L$ variables $\{\beta_{m}^{(j-i)}\}_{m=0}^{L-1}$ encode the required correcting angles $\widetilde{\varphi}_{m}$ from (38). Now, even though it is possible to exploit (47) directly to solve the problem at hand (identifying the phases $\varphi_{1},\ldots,\varphi_{L-1}$ ), we proceed by forming an augmented linear system with more equations compared to the number of variables, which ultimately allows to recover $\hat{\Sigma}_{x}$ for larger ranks $r$ . To this end, we couple together all systems of equations from (47) for all $i,j$ such that $j-i=1$ , noting that $\beta_{m}^{(j-i)}=\beta_{m}^{(1)}$ are shared by all such systems. We then obtain the set of equations

[TABLE]

which is a system of $L^{3}$ equations in $L(r^{4}+1)$ variables. Continuing, we can write the linear system of (48) in standard matrix notation as

[TABLE]

where $\mathbf{0}$ is a column vector of $L^{3}$ zeros, ${b}\in\mathbb{C}^{L+r^{4}L}$ is a column vector of variables formed by stacking $[\beta_{0}^{(1)},\beta_{1}^{(1)},\ldots,\beta_{L-1}^{(1)}]^{T}$ on top of all of the elements in $\{B_{i,i+1}\}_{i=0}^{L-1}$ , and the matrix ${W}\in\mathbb{C}^{L^{3}\times(L+r^{4}L)}$ is constructed from (48) as follows. Let ${Z}^{(i)}\in\mathbb{C}^{L^{2}\times r^{4}}$ and ${M}^{(i)}_{m}\in\mathbb{C}^{L^{2}}$ , for $i,m\in\{0,\ldots,L-1\}$ , be given by

[TABLE]

where $\otimes$ is the Kronecker product, $\mathbf{e}_{m}$ is the $m$ ’th indicator vector (with a single value of $1$ at the $m$ ’th entry), $\operatorname{vec}\{\cdot\}$ is the operation of vectorizing a matrix by stacking its columns on top of one another (with the leftmost column being at the top of the resulting vector), and recall that $V^{(i)}$ is the $L\times r^{2}$ matrix whose colums are the first $r^{2}$ eigenvectors of $H_{i,i}$ (corresponding to its largest eigenvalues). Then, ${W}$ is given by

[TABLE]

where $\operatorname{BlockDiag}\{{{Z}}^{(0)},\ldots,{{Z}}^{(L-1)}\}$ stands for a block-diagonal matrix constructed from the matrices ${{Z}}^{(0)},\ldots,{{Z}}^{(L-1)}$ , namely a matrix of size $L^{3}\times r^{4}L$ with $L$ non-zero blocks along its main diagonal, each of size $L^{2}\times r^{4}$ . Figure 4a depicts the structure of a typical matrix ${W}$ .

Next, note that $b=\mathbf{0}$ is a possible solution to (49), where $\mathbf{0}$ is the column vector of $L+r^{4}L$ zeros. However, we know that there must exist at least one additional nonzero solution corresponding to the true phase ambiguities ( $\beta_{m}^{(1)}=e^{\imath(\varphi_{m}-\varphi_{m-1})}$ is one such solution). Therefore, the linear system of (49) must admit an infinite number of solutions. This implies that if the system in (49) is not underdetermined (as we enforce next), then the smallest singular value of ${W}$ must be zero. Note that the system in (49) is not underdetermined if we require that $L+r^{4}L\leq L^{3}$ , which is equivalent to requiring $r<\sqrt{L}$ (since $r$ and $L$ are integers). In order to proceed, we need the following condition.

Condition 10.

The second-smallest singular value of $W$ is strictly positive.

Condition 10 can be easily tested by computing the singular-value decomposition (SVD) of $W$ constructed from $\hat{\Sigma}_{x}$ . Figure 5a depicts the six smallest singular values of ${W}$ when constructed from a covariance matrix $\hat{\Sigma}_{x}$ with $L=10$ , eigenvalues $[1,0.7,0.5]$ , and eigenvectors randomly sampled from the unit sphere (with uniform distribution).

Now, assuming that $r<\sqrt{L}$ and Condition 10 holds, then the solution to (49) is the span of the right singular vector of ${W}$ corresponding to its smallest singular value. Denoting this singular vector by ${\mathcal{V}}\in\mathbb{C}^{L+r^{4}L}$ , we have that

[TABLE]

for any complex constant $c$ . At this point, we briefly mention that a naive evaluation of $\mathcal{V}$ can be computationally challenging. In this regard, Remark 3 below outlines an efficient approach, which utilizes $W^{*}W$ instead of $W$ to evaluate $\mathcal{V}$ . Continuing with our derivation, from (44), (49), and (53) it follows that

[TABLE]

Hence, the magnitudes of the first $L$ elements of ${\mathcal{V}}$ should be constant, and their phases should satisfy

[TABLE]

where $\operatorname{arg}\{\cdot\}$ is the argument of a complex number ( $\varphi=\operatorname{arg}\{e^{\imath\varphi}\}$ ), and $\alpha\in[0,2\pi)$ is an unknown angle ( $\alpha=-\operatorname{arg}\{c\}$ ). Figure 6b illustrates the magnitudes of the first $30$ elements of ${\mathcal{V}}$ , for the same matrix $\hat{\Sigma}_{x}$ as used in Figure 5, exemplifying the agreement with (54).

Next, note that the set of phase differences $\widetilde{\varphi}_{m}-\widetilde{\varphi}_{m-1}$ for $m=0,\ldots,L-1$ (recalling that $\widetilde{\varphi}_{-1}=\widetilde{\varphi}_{L-1}$ ), satisfies

[TABLE]

Therefore, taking the sum over $m=0,\ldots,L-1$ on both sides in (55) yields

[TABLE]

asserting that $\alpha$ must satisfy

[TABLE]

for some $k\in\{0,\ldots,L-1\}$ . From (55), (58), and taking $\widetilde{\varphi}_{0}=0$ (in accordance with (38)), we arrive at

[TABLE]

for $m=1,\ldots,L-1$ and some $k\in\{0,\ldots,L-1\}$ (where $k$ is fixed for all values of $m$ ), which determines every angle $\widetilde{\varphi}_{m}$ completely up to an additive ambiguity of $2\pi km/L$ for some $k\in\{0,\ldots,L-1\}$ . Reviewing the derivation thus far, (59) is a necessary condition for $\widetilde{X}$ to be Hermitian and PSD (since we derived it from the assumption that $\widetilde{X}$ is Hermitian and PSD). We summarize the derivation up to this point in the following proposition.

Proposition 11.

Suppose that $r<\sqrt{L}$ and Condition 10 holds. If $\widetilde{X}$ of (38) is Hermitian and PSD, then the angles $\widetilde{\varphi}_{1}\ldots,\widetilde{\varphi}_{L-1}$ must follow (59) for some fixed $k\in\{0,\ldots,L-1\}$ .

Although Proposition 11 alone does not guarantee that choosing $\widetilde{\varphi}_{1}\ldots,\widetilde{\varphi}_{L-1}$ according to (59) for any particular $k$ leads to $\widetilde{X}$ which is Hermitian and PSD, nor that $\widetilde{X}\in\Omega(\hat{\Sigma}_{x})$ , those two facts actually follow when combining Proposition 11 with Proposition 3. We then get the following result.

Proposition 12.

Suppose that $r<\sqrt{L}$ and Condition 10 holds. Then, $\widetilde{X}$ of the form of (38) is Hermitian and PSD if and only if $\widetilde{X}\in\Omega(\hat{\Sigma}_{x})$ . Moreover, if the angles $\widetilde{\varphi}_{1}\ldots,\widetilde{\varphi}_{L-1}$ follow (59) for any fixed $k\in\{0,\ldots,L-1\}$ then

[TABLE]

for some $k^{{}^{\prime}}\in\{0,\ldots,L-1\}$ , where $\varphi_{1},\ldots,\varphi_{L-1}$ are from (37), and consequently $\widetilde{X}\in\Omega(\hat{\Sigma}_{x})$ .

Proof.

Proposition 11 asserts that for $\widetilde{X}$ to be Hermitian and PSD, the vector of correcting angles $[\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}]$ must be chosen from the $L$ different options corresponding to different $k\in\{0,\ldots,L-1\}$ in (59) (the options are different since $\widetilde{\varphi}_{1}$ is clearly different for every $k\in\{0,\ldots,L-1\}$ ). On the other hand, taking $[\widetilde{\varphi}_{1}\ldots,\widetilde{\varphi}_{L-1}]$ according to (60) gives

[TABLE]

asserting (via Proposition 3) that $\widetilde{X}$ is Hermitian and PSD, for every $k^{{}^{\prime}}\in\{0,\ldots,L-1\}$ . Therefore, there are $L$ different choices for $[\widetilde{\varphi}_{1}\ldots,\widetilde{\varphi}_{L-1}]$ , given explicitly by (60), that result in $\widetilde{X}$ which is Hermitian and PSD. Consequently, choosing $[\widetilde{\varphi}_{1},\ldots,\widetilde{\varphi}_{L-1}]$ according to (59) must coincide with (60), with some one-to-one mapping between the values of the indices $k$ and $k^{{}^{\prime}}$ , thus establishing all claims of Proposition 12. ∎

Considering the finite-sample case, it is clear that we do not have access to $X$ (of (37)) nor $\widetilde{X}$ (of (38)). Therefore, we replace $X$ and $\widetilde{X}$ with $\widetilde{C}_{x}$ (of (31)) and $\widetilde{\hat{\Sigma}}_{x}$ (of (36)), respectively, and denote by $\widetilde{(\cdot)}$ the corresponding finite-sample analogues of all quantities defined in this section. Figure 5b depicts the behavior of the smallest singular values of $\widetilde{W}$ (the finite-sample analogue of $W$ ), and Figure 6b illustrates the magnitudes of $\widetilde{\mathcal{V}}$ (the right singular vector of $\widetilde{W}$ corresponding to its smallest singular value). The resulting procedure for resolving the diagonal phase ambiguities in the finite-sample case is detailed in Algorithm 2. We mention that if $r$ is unknown, we take it to be the maximal rank allowed by Algorithm 2, which is the largest $r$ such that $r<\sqrt{L}$ (for the system (49) not to be underdetermined).

The following theorem establishes the consistency of the estimator $\widetilde{\Sigma}_{x}$ computed by Algorithm 2.

Theorem 13 (Consistency of Algorithm 2).

Let $\widetilde{C}_{x}$ be the input to Algorithm 2 (replacing $X$ ). Suppose that $r<\sqrt{L}$ , Condition 10 holds, and that $H_{i,i}$ from (41) has $r^{2}$ distinct non-zero eigenvalues for every $i\in\{0,\ldots,L-1\}$ . Then,

[TABLE]

where $\widetilde{{\Sigma}}_{x}$ is the output of Algorithm 2.

The proof of Theorem 13 is provided in Appendix J.

Remark 3 (Efficient evaluation of $\mathcal{V}$ and $\widetilde{\mathcal{V}}$ ).

In general, ${W}$ is of size $L^{3}\times\mathcal{O}(L^{3})$ (assuming $r$ is unknown), and hence the singular-value decomposition (SVD) of ${W}$ becomes computationally prohibitive even for moderate values of $L$ . Therefore, it is essential to compute ${\mathcal{V}}$ without the full SVD of $W$ , while exploiting the special structure of ${W}$ . In particular, note that since the columns of $V^{(i)}$ (the $r^{2}$ eigenvectors of $H_{i,i}$ ) are orthonormal, then also the columns of ${Z}^{(i)}=\overline{V^{(i+1)}}\otimes V^{(i)}$ (see (50)) are orthonormal due to the definition of the Kronecker product. Consequently, the $r^{4}L$ rightmost columns of ${W}$ are orthonormal due to the block-diagonal structure in (52). It then follows that the matrix ${W}^{*}{W}$ is much sparser than ${W}$ , see Figure 4b, as it includes only $\mathcal{O}(L^{4})$ nonzero elements (compared to $\mathcal{O}(L^{5})$ in ${W}$ ). As ${\mathcal{V}}$ is also the eigenvector of ${W}^{*}{W}$ corresponding to its smallest eigenvalue, ${\mathcal{V}}$ can be computed efficiently by the inverse power method using the conjugate-gradients algorithm for inverting ${W}^{*}{W}$ at each iteration. The above discussion applies equivalently to the evaluation of $\widetilde{\mathcal{V}}$ from $\widetilde{W}^{*}\widetilde{W}$ .

4 Numerical experiments

Next, we report our experimental findings on the recovery of $\Sigma_{x}$ from the measurements $y_{1},\ldots,y_{N}$ , using Algorithm 1 followed by Algorithm 2. We use the following measure of discrepancy to evaluate the estimation error:

[TABLE]

where $\widetilde{\Sigma}_{x}$ is the estimator of $\Sigma_{x}$ obtained from the output of Algorithm 2 while treating the rank $r$ as unknown (setting $r=\lceil\sqrt{L}-1\rceil$ per step 2 in Algorithm 2). In all our experiments, we generate the covariance matrix $\Sigma_{x}$ randomly as follows. We sample the eigenvalues $\lambda_{1},\ldots,\lambda_{r}$ of $\Sigma_{x}$ uniformly from $[0,1]$ , and normalize them such that $\sum_{i=1}^{r}\lambda_{i}=1$ , essentially enforcing a fixed signal power of $1$ (i.e. $\mathbb{E}\|x\|^{2}=1$ ). The eigenvectors $v_{1},\ldots,v_{r}$ of $\Sigma_{x}$ are sampled uniformly from the $(L-1)$ -sphere. After generating the covariance matrix $\Sigma_{x}$ , the observations $y_{1},\ldots,y_{N}$ are drawn according to (1) and (3) using a uniform distribution for the cyclic shifts $s$ .

We begin with several experiments testing the performance of our algorithms in the complex-valued case ( $v_{i}\in\mathbb{C}^{L}$ , $a_{i}\sim\mathbb{C}\mathcal{N}(0,\lambda_{i})$ and $\eta\sim\mathbb{C}\mathcal{N}(0,\sigma^{2}I_{L})$ ). We first explore the ability of our algorithms to recover $\Sigma_{x}$ for different ranks $r$ and signal lengths $L$ . For these experiments, we use $N=10^{5}$ , and consider the maximal error among $200$ trials. The results are shown in Figure 7. As supported by Theorem 13, small estimation errors are always achieved when $r<\sqrt{L}$ . However, since the algorithm cannot handle the case of $r\geq\sqrt{L}$ (the linear system in (49) becomes underdetermined), the worst-case estimation errors rapidly increase with $r$ for $r\geq\sqrt{L}$ .

Continuing, we investigate the behavior of the estimation error as a function of the number of observations $N$ . In this experiment, the error is averaged over $200$ trials. Figure 8 displays the estimation error of Algorithm 2 as a function of $N$ , for $L=26$ , $r\in\{2,5\}$ and $\sigma^{2}\in\{0,0.01,0.05\}$ . As expected from Theorem 13, the error decreases with $N$ for all fixed values of $r$ and $\sigma^{2}$ . In this regard, the empirical results suggest that the error is proportional to $1/N$ . Furthermore, it is evident that the existence of noise simply shifts the error curves to the right, such that more observations (by a constant factor) are required to achieve the same estimation error as without noise. Note that in this example, $\sigma^{2}=0.05$ corresponds to a noise magnitude (given by $\sigma^{2}L$ ) approximately equal to the signal’s strength (normalized to be $1$ ). Even in this challenging regime, an accurate estimation of $\Sigma_{x}$ is achieved when $N\approx 10^{5}$ , implying that our method can successfully cope with high levels of noise. Also, as evident from Figure 8, the estimation error grows with $r$ . This is due to the fact that the fourth moment of $\hat{y}$ (and the trispectrum in particular) is harder to estimate for larger ranks, since the variability in the quantities $\hat{y}[k_{1}]\overline{\hat{y}[k_{2}]}\hat{y}[k_{3}]\overline{\hat{y}[k_{4}]}$ increases with $r$ .

Last, we demonstrate the performance of our algorithms for the real-valued case ( $v_{i}\in\mathbb{R}^{L}$ , $a_{i}\sim\mathcal{N}(0,\lambda_{i})$ and $\eta\sim\mathcal{N}(0,\sigma^{2}I_{L})$ ), and we note that the difference in our algorithms between the real-valued and the complex-valued cases lies only in step 4 of Algorithm 1 (as solving (29) in the complex-valued case is replaced by solving (34) in the real-valued case). The results are shown in Figures 9 and 10, which are analogous to Figures 7 and 8 in the complex-valued case, respectively. It is evident that the performance of our algorithms in the real-valued case is very similar to that of the complex-valued case, with almost identical behavior in all aspects, albeit slightly larger estimation errors.

5 Summary and discussion

In this work, we considered the problem of recovering the covariance matrix of a random signal $x$ observed through unknown translations and corrupted by noise, where the signal $x$ is low-rank (i.e. it follows the factor model (3) with $r$ much smaller than $L$ ). We have shown that unique recovery of the covariance matrix is possible (up to a set of fundamental ambiguities, see Proposition 3) when $r<\sqrt{L}$ and Condition 10 holds. We provided statistically-consistent polynomial-time estimation procedures, and concluded with numerical simulations corroborating our theoretical findings.

There are many open questions emerging from this work, giving rise to several possible future research directions. First, we discuss future research directions associated with the model (1)–(3). While we have shown that recovery of the covariance matrix from the power spectrum and the trispectrum is possible when $r<\sqrt{L}$ and impossible when $r=L$ (i.e. the covariance is full-rank), it is of interest to determine tighter upper and lower bounds on the rank $r$ characterizing when the recovery can be attained (both theoretically and using polynomial-time algorithms), possibly determining the exact phase transition, namely the set of ranks above which recovery is no longer possible. Moreover, even when covariance estimation from the power spectrum and the trispectrum alone is impossible, it is of interest to investigate the advantages of adding higher-order moments. Last, while we established statistical consistency for our estimators, it is favorable to investigate their estimation errors in terms of the quantities governing our model ( $N$ , $L$ , $r$ , and $\sigma^{2}$ ).

Aside from the above-mentioned research directions associated with the model (1)–(3), it is worthwhile to consider various extensions of this model. First, one could replace the normal distribution with a broader family of distributions, allowing for more general factor models. Second, the one-dimensional setting (implicitly assumed in (1)–(3)) could be extended to higher dimensions, with other group actions replacing the cyclic shift $R_{s}$ in (2). For example, one-dimensional signals could be replaced with two-dimensional images, where cyclic shifts are replaced with in-plane rotations. This extension could have important applications in rotation-invariant processing of datasets of two-dimensional images, see for example [44, 25, 26, 29].

Acknowledgements

We would like to thank Nicolas Boumal for several enlightening conversations, and to Boaz Nadler for useful comments and suggestions. This research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH).

Appendix A Proof of Proposition 1

The expression for $P_{y}$ follows directly from its definition (10). We now prove (13). Towards this end, we use existing results on the moments of the complex normal distribution (see [35]). However, since $\hat{\Sigma}_{x}$ is not invertible when $r<L$ , we cannot claim that $\hat{x}\sim\mathbb{C}\mathcal{N}(0,\hat{\Sigma}_{x})$ . Therefore, we first treat the case of $r=L$ , and then extend our result to the case of $r<L$ by a continuity argument.

Suppose that $r=L$ , i.e. $\lambda_{i}>0$ for all $i$ . Then $\hat{x}\sim\mathbb{C}\mathcal{N}(0,\hat{\Sigma}_{x})$ , and according to Theorem 5 in [35] we have

[TABLE]

where $\otimes$ is the Kronecker product, and $I_{L,L}$ is the $L^{2}\times L^{2}$ commutation matrix, given by

[TABLE]

for $k_{1},k_{3}\in\{0,\ldots,L-1\}$ , and $m\in\{0,\ldots,L^{2}-1\}$ . Note that we can write $M^{(4)}_{\hat{x}}$ (defined by replacing $\hat{y}$ in (9) with $\hat{x}$ ) as

[TABLE]

and it follows from (64) and (65) that

[TABLE]

where we used the fact that $\hat{\Sigma}_{x}$ is Hermitian. Therefore, by substituting $k_{4}=k_{1}-k_{2}+k_{3}$ we get

[TABLE]

Last, we extend the result above to the case of $r<L$ . It is easy to verify that $T_{y}$ is continuous in $\lambda_{1},\ldots,\lambda_{L}$ , since $T_{y}$ is a subset of

[TABLE]

which is a polynomial in $\lambda_{1},\ldots,\lambda_{L}$ (since $a_{i}\sim\mathbb{C}\mathcal{N}(0,\lambda_{i})$ ) for all values of $\lambda_{1},\ldots,\lambda_{L}$ (including zero). Therefore, fixing $r$ and taking $\lambda_{i}\rightarrow 0$ for $i>r$ on both sides of (68), we get due to the continuity of $T_{y}$ that (68) also holds for any $r<L$ (where $\lambda_{i}=0$ for $i>r$ ).

Appendix B Proof of Lemma 2

For $X\in\mathbb{C}^{L\times L}$ it is convenient to define $D_{m}\in\mathbb{C}^{L}$ as

[TABLE]

for $m,k\in\{0,\ldots,L-1\}$ . In other words, $D_{m}$ is the $m$ ’th diagonal of $X$ with circulant wrapping, and is analogous to $d_{m}$ of (24) for $\sigma=0$ (which is the $m$ ’th diagonal of $\hat{\Sigma}_{x}$ with circulant wrapping). By (70), (15), and (16), the set of equations (14) can be written as

[TABLE]

By taking (with some abuse of notation) $k_{2}=k_{1}+m$ , $k_{3}=k_{2}+m$ , we can rewrite (72) more conveniently (and analogously to (26)) as

[TABLE]

for $k_{1},k_{2},m\in\{0,\ldots,L-1\}$ (noting that $k_{1}$ and $k_{2}$ in (73) are not equivalent to $k_{1}$ and $k_{2}$ in (72)). Now, for the “if” part of the “if and only if” statement of Lemma 2, it is straightforward to verify that taking $X$ according to (18), namely $D_{m}[k]=\hat{\Sigma}_{x}[k,k+m]e^{\imath\varphi_{m}}$ with $\varphi_{0}=0$ , satisfies both (71) and (73) (substituting (12) and (13) into (71) and (73)) since the terms $e^{\imath\varphi_{m}}$ cancel out. We now consider the other direction, namely the “only if” part of the statement. Suppose that the set of equations in (14) hold. We now prove the required result by the following three steps. First, taking $m=0$ in (73) and substituting (71) gives

[TABLE]

which establishes (by substituting (12) and (13)) that

[TABLE]

for every $k_{1},k_{2}\in\{0,\ldots,L-1\}$ . Second, taking $m=1$ and $k_{2}=k_{1}+1$ in (73) leads to

[TABLE]

for $k_{1}\in\{0,\ldots,L-1\}$ . Substituting (13) in the above equation, we have

[TABLE]

Now, taking $k_{1}=0$ and $k_{2}=1$ in (75) establishes that $D_{1}[0]=\hat{\Sigma}_{x}[0,1]e^{\imath\varphi_{1}}$ with some $\varphi_{1}\in[0,2\pi)$ . Then, (77) determines $D_{1}[k]$ completely for all $k$ by an iterative procedure, as $D_{1}[1]$ is obtained from $D_{1}[0]$ , $D_{1}[2]$ is obtained from $D_{1}[1]$ , and so on, where each element is obtained by dividing both sides of (77) by $D_{1}[k]$ (we never divide by zero from the assumption in Lemma 2 that $\hat{\Sigma}_{x}[k_{1},k_{2}]\neq 0$ ). Consequently, we have that

[TABLE]

Last, taking $k_{2}-k_{1}=1$ in (73) gives

[TABLE]

and substituting (13) together with (78) establishes that

[TABLE]

Repeating our previous argumentation (for $m=1$ ) now for every $m\in\{2,\ldots,L-1\}$ , we take $k_{1}=0$ and $k_{2}=m$ in (75), which establishes that $D_{m}[0]=\hat{\Sigma}_{x}[0,m]e^{\imath\varphi_{m}}$ with some $\varphi_{m}\in[0,2\pi)$ , and then (80) determines $D_{m}[k]$ for every $k=1,\ldots,L-1$ , by the previously mentioned iterative process. Therefore, we have that

[TABLE]

for all $m=1,\ldots,L-1$ and $k=0,\ldots,L-1$ , which concludes the proof.

Appendix C Proof of Lemma 4

We begin with the case of $r=1$ . Note that

[TABLE]

Let us define

[TABLE]

for $m=1,\ldots,L-1$ , and it follows that we can write $\widetilde{X}$ as

[TABLE]

Suppose that $\widetilde{X}$ solves Problem 3, namely $\widetilde{X}$ is Hermitian and PSD. Recall that the inertia of a Hermitian matrix $A$ is the triplet $\{n_{0}\{A\},n_{+}\{A\},n_{-}\{A\}\}$ describing the number of zero, positive, and negative eigenvalues, respectively, of $A$ . Since $\widetilde{X}$ is Hermitian and PSD, all of its eigenvalues are non-negative, hence $n_{-}\{\widetilde{X}\}=0$ , and by Sylvester’s law of inertia, the matrix

[TABLE]

is also Hermitian and PSD, since it preserves the inertia of $\widetilde{X}$ (where $\left(\operatorname{diag}\{\hat{v}_{1}\}\right)^{-1}$ is well-defined since $\hat{v}_{1}[k]\neq 0$ from the assumptions of Lemma 4). Now, it is well-known that a circulant matrix can be diagonalized by the DFT matrix (5), and in particular, we can write

[TABLE]

where $f_{i}$ is the $i$ ’th DFT vector defined in (5), and $\mu_{1},\ldots,\mu_{L}$ are the eigenvalues of $\operatorname{Circulant}\{1,e^{\imath\varphi_{1}^{{}^{\prime}}},\ldots,e^{\imath\varphi_{L-1}^{{}^{\prime}}}\}$ , which are non-negative as shown in (85). We now prove that $\mu_{1},\ldots,\mu_{L}$ are all non-negative only if they are all zero except for one of them. Note that

[TABLE]

and therefore

[TABLE]

When combining both of the above equations (in particular, squaring both sides of the left equation and subtracting the right equation), we have that

[TABLE]

with $\mu_{i}\geq 0$ for all $i$ . It then immediately follows that $\mu_{\ell}>0$ for some single $l\in\{0,\ldots,L-1\}$ while $\mu_{k}=0$ for all $k\neq\ell$ , since otherwise $\mu_{\ell}\mu_{k}>0$ for some $\ell,k$ , which is a contradiction to (89). Consequently, we have that $\mu_{\ell}=L$ for some $\ell\in\{0,\ldots,L-1\}$ (see the left equation in (88)), and

[TABLE]

which implies that $\widetilde{X}\in\Omega(\hat{\Sigma}_{x})$ (see also (20)), and hence $\widetilde{X}$ solves Problem 2.

Last, for the case of $1<r<\sqrt{L}$ under Condition 10, we refer the reader to the derivation in Section 3.2, which provides a complete proof for this case through the derivation of the procedure for solving Problem 3, and whose results are summarized in Proposition 12.

Appendix D Proof of Proposition 6

Let us take $X\in\mathbb{C}^{L\times L}$ as

[TABLE]

More specifically, we have

[TABLE]

Clearly, $X$ follows the form of (18) in Lemma 2, hence $X$ satisfies the equations

[TABLE]

Moreover, $X$ is Hermitian, and

[TABLE]

Since the eigenvalues of a square matrix depend continuously on its elements (theorem 2.4.9.2 in [20]), we also have that

[TABLE]

where $\lambda_{\min}\{X\}$ stands for the smallest eigenvalue of $X$ . Because $\lambda_{L}>0$ when $r=L$ , there exists a sufficiently small $\epsilon>0$ such that if $0<|\varphi_{1}|\leq\epsilon$ then $\lambda_{\min}\{X\}>0$ , and consequently $X$ is PSD. However, it is evident that $X\notin\Omega(\hat{\Sigma}_{x})$ , which concludes the proof.

Appendix E Justification of (25) and (26)

Since we want to account for an arbitrary noise variance $\sigma^{2}$ , whereas Proposition 1 considers explicitly the noiseless case $\sigma=0$ , we introduce a certain update which places us in the noiseless setting and allows us to use Proposition 1. Note that according to the definition of $y$ in (1), $y$ admits the same distribution as $R_{s}\{x+\eta\}$ (since the distribution of the noise $\eta$ is invariant to the operation $R_{s}$ ), and consequently, $\hat{y}$ from (7) admits the same distribution as $\operatorname{diag}(f_{s})(\hat{x}+\hat{\eta})$ . Therefore, we can absorb the noise variance $\sigma^{2}$ into the main diagonal of $\hat{\Sigma}_{x}$ . That is, with some abuse of notation, we update $\hat{\Sigma}_{x}$ according to

[TABLE]

and then omit the noise vector $\hat{\eta}$ (from (7)) entirely. This update places us in the noiseless setting of $\sigma=0$ in (7) (after fixing $\hat{\Sigma}_{x}$ according to (96)) where the power spectrum and the trispectrum are determined solely by $\hat{\Sigma}_{x}$ according to Proposition 1. Then, taking $d_{m}$ according to (24) and applying Proposition 1 gives (25) and (26).

Appendix F The trispectrum for the real-valued case

This proof follows very closely with the proof in Appendix A. Analogously to the proof in Appendix A, we first consider the case of $r=L$ (i.e. $\lambda_{i}>0$ for all $i$ ) so we may claim that $x$ is normally-distributed and use standard results on the moments of the normal distribution. We then extend our result to any $r<L$ by a continuity argument. Consider the case of $r=L$ . Then, $x\sim\mathcal{N}(0,\Sigma_{x})$ , and according to Isserlis’ formula [23] (for computing the moments of the zero-mean multivariate normal distribution) we have that

[TABLE]

where we used the fact that $x$ is real-valued, hence $\hat{x}[k]=\overline{\hat{x}[-k]}$ and thus $\mathbb{E}\left[\hat{x}[k_{1}]\hat{x}[k_{2}]\right]=\mathbb{E}[\hat{x}[k_{1}]\overline{\hat{x}[-k_{2}]}]=\hat{\Sigma}_{x}[k_{1},-k_{2}]$ . Therefore, it follows that

[TABLE]

where we used the observation that $\hat{\Sigma}_{x}[k_{1},k_{2}]=\overline{\hat{\Sigma}_{x}[-k_{1},-k_{2}]}=\hat{\Sigma}_{x}[-k_{2},-k_{1}]$ for any $k_{1},k_{2}$ , since $\hat{\Sigma}_{x}$ is Hermitian. Taking $d_{m}$ according to (24), and using $G_{m}=d_{m}d_{m}^{*}$ in (98) gives

[TABLE]

where we used the fact that $\hat{\Sigma}_{x}$ is Hermitian. Last, a continuity argument (repeating the argument at the end of Appendix A) extends the above result to the case of an arbitrary $r<L$ .

Appendix G Proof of Theorem 7

The following lemma establishes that when $\widetilde{T}_{y}=T_{y}$ and $\widetilde{P}_{y}=P_{y}$ (i.e. when $N\rightarrow\infty$ ), then (29) admits a unique minimizer, which is equal to (27).

Lemma 14.

Suppose that $\widetilde{T}_{y}=T_{y}$ , $\widetilde{P}_{y}=P_{y}$ , and assume that $|\hat{\Sigma}_{x}[i,j]|>0$ for all $i,j$ . If $\{G_{m}^{\star}\}_{m=1}^{L-1}$ is a minimizer of (29), then $G_{m}^{\star}=G_{m}=d_{m}\cdot d_{m}^{*}$ for $m=1,\ldots,L-1$ .

Proof.

Since (29) is convex, all minimizers attain the same objective value, which is zero since $\{G_{m}\}_{m=1}^{L-1}$ is a minimizer. Therefore, we have that

[TABLE]

for all indices $m,k_{1},k_{2}$ . Considering the case $m=0$ , and observing that $G_{0}^{\star}=G_{0}$ (since (29) enforces ${G}^{\star}_{0}=\widetilde{P}_{y}\widetilde{P}_{y}^{T}=P_{y}P_{y}^{T}=G_{0}$ ), we have

[TABLE]

Hence, the main diagonal of $G_{m}^{\star}$ is equal to the main diagonal of $G_{m}$ , for $m=1,\ldots,L-1$ . Next, consider the case $k_{2}-k_{1}=m$ , for which we get from (100)

[TABLE]

Therefore, we conclude that the $m$ ’th diagonal (with circulant wrapping) of $G_{m}^{\star}$ is equal to the $m$ ’th diagonal of $G_{m}$ , for $m=1,\ldots,L-1$ . Up to this point, we established that $G_{m}^{\star}$ and $G_{m}$ agree on two diagonals (their main diagonal and their $m$ ’th diagonal) for every $m$ . Now, we turn to show that if $G_{m}^{\star}\succeq 0$ , then (101) and (102) imply that $G_{m}^{\star}=G_{m}$ (i.e. $G_{m}^{\star}$ and $G_{m}$ agree on all diagonals). Let us define the matrix

[TABLE]

where $d_{m}$ is from (24), and (103) is well defined since $|d_{m}[k]|>0$ for all $m,k$ from the assumptions of the lemma. Since $\{G_{m}^{\star}\}_{m=1}^{L-1}$ is a minimizer of (29) then $G_{m}^{\star}\succeq 0$ , and by (103) also $\hat{G}_{m}^{\star}\succeq 0$ (due to Sylvester’s Inertia theorem). Then, since $G_{m}=d_{m}d_{m}^{*}$ , and the fact that $G_{m}^{\star}$ and $G_{m}$ have the same values on their main and $m$ ’th diagonals, it follows that

[TABLE]

for all indices $m,k$ . Now, since $\hat{G}_{m}^{\star}$ is positive semidefinite with unit diagonal, it can take the role of a correlation matrix of a random vector. In particular, let us consider a random vector $z_{m}\in\mathbb{R}^{L}$ , with $z_{m}[k]~{}\sim\mathcal{N}(0,\hat{G}_{m}^{\star}[k,k])$ for $k=0,\ldots,L-1$ , noting that

[TABLE]

Fixing $m=1$ , the above relations imply that $z_{1}[k]$ and $z_{1}[k+1]$ are perfectly correlated normal variables with unit variances, and hence linearly dependent (almost surely) with

[TABLE]

for $k=0,\ldots,L-1$ (since $\mathbb{E}|z[k]-z[k+1]|^{2}=0)$ . Therefore, it follows that $z_{1}[0]=z_{1}[1]=\ldots=z_{1}[L-1]$ (almost surely), and $\hat{G}_{1}^{\star}=\mathbb{E}[z_{1}z_{1}^{*}]$ must be of rank one with

[TABLE]

where $\mathbf{1}$ denotes an $L\times 1$ vector of ones. From (107) and (103) it then follows that

[TABLE]

Using the above result for $\hat{G}_{1}^{\star}$ together with (100) provides us with an additional equation on the diagonals of $G_{m}^{\star}$ , namely

[TABLE]

and hence, using (103) again,

[TABLE]

for $k=0,\ldots,L-1$ and $m=2,\ldots,L-1$ . Therefore, we established that $G_{m}^{\star}$ and $G_{m}$ agree on their main and first diagonals for every $m$ . We can now repeat our previous arguments of the case of $m=1$ (using $\hat{G}^{*}_{1}=\mathbb{E}[z_{1}z_{1}^{*}]$ ) for $m=2,\ldots,L-1$ , resulting in

[TABLE]

for $k=0,\ldots,L-1$ and $m=2,\ldots,L-1$ , almost surely. Therefore,

[TABLE]

and consequently

[TABLE]

for all $m=2,\ldots,L-1$ . ∎

The next lemma establishes that problem (29) is robust to errors in the estimation of $P_{y}$ and $T_{y}$ .

Lemma 15 (Stability of (29)).

Suppose that $|\hat{\Sigma}_{x}[i,j]|>0$ for all $i,j\in\{0,\ldots,L-1\}$ . If $\widetilde{T}_{y}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}T_{y}$ and $\widetilde{P}_{y}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}P_{y}$ (element-wise), then

[TABLE]

for $m=1,\ldots,L-1$ .

Proof.

For convenience, we formulate a problem equivalent to (29) in matrix-vector notation. Let ${t}\in\mathbb{C}^{L^{3}}$ , $\widetilde{t}\in\mathbb{C}^{L^{3}}$ , $g_{0}\in\mathbb{C}^{L^{2}}$ , $\widetilde{g}_{0}\in\mathbb{C}^{L^{2}}$ , $g\in\mathbb{C}^{L^{3}-L^{2}}$ , $\widetilde{g}\in\mathbb{C}^{L^{3}-L^{2}}$ be vectors obtained from vectorizing $T_{y}$ , $\widetilde{T}_{y}$ , $G_{0}$ , $\widetilde{G}_{0}:=\widetilde{P}_{y}\widetilde{P}_{y}^{T}$ , $\{G_{m}\}_{m=1}^{L-1}$ , and $\{\widetilde{G}_{m}\}_{m=1}^{L-1}$ , respectively. Then, the set of equations (28)

[TABLE]

for $k_{1},k_{2},m=0,\ldots,L-1$ , can be written in matrix form as

[TABLE]

where $A_{0}\in\mathbb{R}^{L^{3}\times L^{2}}$ , $A\in\mathbb{R}^{L^{3}\times(L^{3}-L^{2})}$ are suitable matrices (whose exact expressions are not important for this proof). Next, we define the following functions:

[TABLE]

for $g^{{}^{\prime}}\in\mathbb{C}^{L^{3}-L^{2}}$ obtained by vectorizing $\{G^{{}^{\prime}}_{m}\}_{m=1}^{L-1}$ from (29), hence satisfying the semidefinite constraints associated with $G^{{}^{\prime}}_{m}\succeq 0$ for $m=1,\ldots,L-1$ . Recall from (29) that $\widetilde{g}$ is a minimizer of $\widetilde{J}$ . Therefore,

[TABLE]

which together with (116) gives

[TABLE]

On the other hand, by the reverse triangle inequality it follows that

[TABLE]

and by combining (121) and (120) we have

[TABLE]

Therefore, if $\widetilde{T}_{y}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}T_{y}$ , $\widetilde{P}_{y}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}P_{y}$ , we have that $\left\|\widetilde{t}-t\right\|\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}0$ , $\left\|\widetilde{g}_{0}-g_{0}\right\|\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}0$ , and thus

[TABLE]

Last, since $J$ is a non-negative and convex function over a convex domain with a unique minimizer (Lemma 14), it follows from (123) that (see Corollary 27.2.2 in [33])

[TABLE]

or equivalently

[TABLE]

∎

Since $\widetilde{P}_{y}$ and $\widetilde{T}_{y}$ (of (22) and (23)) are consistent estimators for $P_{y}$ and $T_{y}$ , respectively, we have that

[TABLE]

Therefore, by Lemma 15 it follows that

[TABLE]

for $m=1,\ldots,L-1$ . Note that $G_{m}$ is of rank one, with leading eigenvector $d_{m}/\|d_{m}\|$ and leading eigenvalue

[TABLE]

Recall from (30) that

[TABLE]

where $\widetilde{\mu}_{1}^{(m)}$ is the leading eigenvalue of $\widetilde{G}_{m}$ and $\widetilde{u}_{1}^{(m)}$ is its corresponding eigenvector. Classical results in matrix perturbation theorey establish that $\widetilde{d}_{m}$ converges to $d_{m}$ almost surely. In particular, the Davis-Kahan theorem [43, 15] asserts that

[TABLE]

for some $\varphi_{m}\in[0,2\pi)$ , where ${u}_{1}^{(m)}$ is the leading eigenvector of $G_{m}$ , and by Weyl [42]

[TABLE]

where ${\mu}_{1}^{(m)}$ is the largest eigenvalue of $G_{m}$ (corresponding to ${u}_{1}^{(m)}$ ). Hence, by (127), (130) and (131) it follows that

[TABLE]

for $m=1,\ldots,L-1$ . Last, by the definition of $\widetilde{C}_{x}$ in (31)

[TABLE]

where we used the fact that $\hat{\Sigma}_{x}[k,k]=d_{0}[k]-\sigma^{2}=P_{y}[k]-\sigma^{2}$ (see (24) and (25)), (132), and the fact that $\widetilde{P}_{y}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}P_{y}$ , which concludes the proof.

Appendix H Proof of Lemma 8

First, (41) follows from (37) since a circulant matrix is invariant to $R_{i,i}$ (for any $i$ ), hence

[TABLE]

where $\mathbf{1}_{L\times L}$ is a $L\times L$ matrix of ones. Second, the fact that $H_{i,i}$ is Hermitian follows from

[TABLE]

where we used (41) and (40). Third, by (41), $H_{i,i}$ is PSD since the Hadamard product of two PSD matrices is also PSD due to the Schur product theorem (see Theorem 5.2.1 in [34]). Last, (42) is due to a well-known bound on the rank of the Hadamard product (see Theorem 5.1.7 in [34]).

Appendix I Proof of Lemma 9

By the definition of the $L\times L$ blocks of $S$ in (43), we have that

[TABLE]

It is easy to verify from (136) that $S$ is Hermitian if $\widetilde{X}$ is Hermitian (this follows immediately from interchanging $i$ with $j$ , and $k_{1}$ with $k_{2}$ ). Next, a key observation for this proof is that $S$ is similar to the matrix $\overline{\widetilde{X}}\otimes\widetilde{X}$ , where $\otimes$ is the Kronecker product. This is due to the fact that

[TABLE]

and hence $S$ can be transformed into $\overline{\widetilde{X}}\otimes\widetilde{X}$ by an appropriate permutation of its rows and columns. Specifically, we can write

[TABLE]

and it follows that there exists a permutation matrix $P\in\mathbb{R}^{L^{2}\times L^{2}}$ such that

[TABLE]

It is well-known that the eigenvalues of $\overline{\widetilde{X}}\otimes\widetilde{X}$ are given by the pair-wise products between the eigenvalues of $\widetilde{X}$ and the eigenvalues of $\overline{\widetilde{X}}$ (see [34]). Hence, if $\widetilde{X}$ is Hermitian and PSD, then $\overline{\widetilde{X}}\otimes\widetilde{X}$ is also Hermitian and PSD, and consequently so is $S$ by its similarity to $\overline{\widetilde{X}}\otimes\widetilde{X}$ .

Appendix J Proof of Theorem 13

By Theorem 7, we can write

[TABLE]

where $X^{(N)}$ is equal to $X$ from (37) but with angles $\varphi_{1},\ldots,\varphi_{L-1}$ that may depend on $N$ , and $\|E^{(N)}\|_{F}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}0$ . For simplicity of presentation, we omit the superscript $(\cdot)^{(N)}$ in $X^{(N)}$ and $E^{(N)}$ from all subsequent derivations. Let $W$ of (52) be the matrix constructed from $X$ as described in Section 3.2, and let $\widetilde{W}$ be a matrix analogous to $W$ when using $\widetilde{C}_{x}$ instead of $X$ . We now analyze the different quantities involved in the construction of $\widetilde{W}$ . By (39) we have that

[TABLE]

Using the bound $\|A\odot B\|_{F}\leq\|A\|_{F}\|B\|_{F}$ (see [34]), it follows that

[TABLE]

where we used the fact that $\left\|X\right\|_{F}=\left\|\hat{\Sigma}_{x}\right\|_{F}$ . Since $H_{i,i}$ admits $r^{2}$ distinct and non-zero eigenvalues (see the assumptions of Theorem 13), we have from the Davis-Kahan Theorem [43, 15] that

[TABLE]

where $V^{(i)}$ and $\widetilde{V}^{(i)}$ are $L\times r^{2}$ matrices whose columns are the first $r^{2}$ eigenvectors (corresponding to the largest eigenvalues) of $H_{i,i}$ and $\widetilde{H}_{i,i}$ , respectively. Next, define the matrices $\widetilde{Z}^{(i)}\in\mathbb{C}^{L^{2}\times r^{4}}$ and the vectors $\widetilde{M}^{(i)}_{m}\in\mathbb{C}^{L^{2}}$ , for $i,m\in\{0,\ldots,L-1\}$ , analogously to ${Z}^{(i)}$ and ${M}^{(i)}_{m}$ of (51), by

[TABLE]

where $\otimes$ is the Kronecker product, $\mathbf{e}_{m}$ is the $m$ ’th indicator vector (with a single value of $1$ at the $m$ ’th entry), and $\operatorname{vec}\{\cdot\}$ is the operation of vectorizing a matrix by stacking its columns on top of each other. From (143), we can write

[TABLE]

where $\|E_{V}^{(i)}\|_{F}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}0$ (and the angles $\vartheta^{(i)}_{1},\ldots,\vartheta^{(i)}_{r^{2}}$ depend on $N$ ). Therefore, we can write

[TABLE]

where we used the mixed-product property of the Kronecker product (i.e. $(A\cdot B)\otimes(C\cdot D)=(A\otimes C)\cdot(B\otimes D)$ , see [34]). By using the bound $\|A\otimes B\|_{F}\leq\|A\|_{F}\|B\|_{F}$ (see [34]) together with $\|E_{V}^{(i)}\|_{F}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}0$ , we have that

[TABLE]

for $i\in\{0,\ldots,L-1\}$ . Next, from (142) and (144) it immediately follows that

[TABLE]

for $i,m\in\{0,\ldots,L-1\}$ . Recall that $\widetilde{W}$ is formed according to the right-hand side of (52) when replacing ${{Z}^{(i)}}$ and ${{M}^{(i)}_{m}}$ with $\widetilde{Z}^{(i)}$ and $\widetilde{M}^{(i)}_{m}$ , respectively. Therefore, by (147) and (148) it follows that

[TABLE]

where $\mathbf{1}_{L}$ is a column vector of $L$ ones. Recall that $\mathcal{V}$ and $\widetilde{\mathcal{V}}$ are the right singular vectors of $W$ and $\widetilde{W}$ corresponding to their smallest singular values, respectively. Let $\mathcal{V}_{L}\in\mathbb{C}^{L}$ and $\widetilde{\mathcal{V}}_{L}\in\mathbb{C}^{L}$ be the first $L$ elements of $\mathcal{V}$ and $\widetilde{\mathcal{V}}$ , respectively. Note that the matrices $W$ and $W\cdot\operatorname{diag}\{\mathbf{1}_{L}^{T},e^{\imath\gamma^{(0)}_{1}},\ldots,e^{\imath\gamma^{(0)}_{r^{4}}},\ldots,e^{\imath\gamma^{(L-1)}_{1}},\ldots,e^{\imath\gamma^{(L-1)}_{r^{4}}}\}$ agree in their singular values and in the first $L$ entries of their singular vectors. Therefore, from (149) it follows that

[TABLE]

where we used the Davis-Kahan Theorem [43, 15] together with the fact that the smallest singular value of $W$ is zero while its second-smallest singular value is strictly positive (resulting in a spectral gap for the smallest singular value). Since the elements of $\mathcal{V}_{L}$ are bounded away from zero (they have magnitudes of $1$ according to (54)), from (150) it follows that

[TABLE]

Let $\widetilde{\widetilde{\varphi}}_{m}$ be analogous to $\widetilde{\varphi}_{m}$ from (59) when replacing ${\mathcal{V}}$ with $\widetilde{\mathcal{V}}$ , and fixing $\widetilde{\widetilde{\varphi}}_{0}=0$ , i.e.

[TABLE]

Then, it follows from (152), (59) and (151) that

[TABLE]

for $m=1,\ldots,L-1$ , which together with Proposition 12 implies that

[TABLE]

With some abuse of notation, let $\widetilde{\hat{\Sigma}}_{x}$ be as in (36) with $\widetilde{\widetilde{\varphi}}_{m}$ replacing $\widetilde{\varphi}_{m}$ . Then, employing (140) and (37) yields

[TABLE]

Hence, by the above equation together with (154), (20), and the fact that $\|E\|_{F}\underset{N\rightarrow\infty,\;\text{a.s.}}{\longrightarrow}0$ , we have

[TABLE]

and (62) in Theorem 13 follows in a straightforward manner (using $\widetilde{\Sigma}_{x}=F^{*}\widetilde{\hat{\Sigma}}_{x}F$ ).

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Emmanuel Abbe, Tamir Bendory, William Leeb, João Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. ar Xiv preprint ar Xiv:1710.02793 , 2017.
2[2] Emmanuel Abbe, João M Pereira, and Amit Singer. Sample complexity of the boolean multireference alignment problem. In Proceedings. IEEE International Symposium on Information Theory , volume 2017, page 1316. NIH Public Access, 2017.
3[3] Emmanuel Abbe, João M Pereira, and Amit Singer. Estimation in the group action channel. ar Xiv preprint ar Xiv:1801.04366 , 2018.
4[4] Gil Aharoni, Amir Averbuch, Ronald Coifman, and Moshe Israeli. Local cosine transform—a method for the reduction of the blocking effect in jpeg. In Wavelet Theory and Application , pages 7–38. Springer, 1993.
5[5] Yariv Aizenbud, Boris Landa, and Yoel Shkolnisky. Rank-one multi-reference factor analysis. ar Xiv preprint ar Xiv:1905.12442 , 2019.
6[6] Afonso Bandeira, Philippe Rigollet, and Jonathan Weed. Optimal rates of estimation for multi-reference alignment. ar Xiv preprint ar Xiv:1702.08546 , 2017.
7[7] Afonso S Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Weed, and Alexander S Wein. Estimation under group actions: recovering orbits from invariants. ar Xiv preprint ar Xiv:1712.10163 , 2017.
8[8] Afonso S Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science , pages 459–470. ACM, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Multi-Reference Factor Analysis: low-rank covariance estimation under unknown translations

Abstract

1 Introduction

1.1 The setting

Problem 1** (Multi-Reference Factor Analysis).**

1.2 Related work

1.3 Our contributions and main results

1.3.1 Characterization of uniqueness and identifiability

1.3.2 Practical estimation procedures with theoretical guarantees

2 Invariant moments and identifiability of Σx\Sigma_{x}Σx​

Proposition 1** (Explicit form of Py{P}_{y}Py​ and Ty{T}_{y}Ty​).**

Lemma 2** (Solutions of the moments equations).**

Proposition 3** (Fundamental ambiguities).**

Proof.

Problem 2** (Circulant phase retrieval).**

Problem 3**.**

Lemma 4**.**

Theorem 5** (Low-rank recovery of Σ^x\hat{\Sigma}_{x}Σ^x​).**

Proposition 6** (Full-rank Σ^x\hat{\Sigma}_{x}Σ^x​).**

3 Recovering the covariance matrix Σx\Sigma_{x}Σx​

3.1 Step 1: Recovering Σ^x\hat{\Sigma}_{x}Σ^x​ up to diagonal phase ambiguities

Theorem 7** (Consistency of (31)).**

Remark 1**.**

Remark 2**.**

3.2 Step 2: Resolving the diagonal phase ambiguities

Lemma 8**.**

Lemma 9**.**

Condition 10**.**

Proposition 11**.**

Proposition 12**.**

Proof.

Theorem 13** (Consistency of Algorithm 2).**

Remark 3** (Efficient evaluation of V\mathcal{V}V and V~\widetilde{\mathcal{V}}V).**

4 Numerical experiments

5 Summary and discussion

Acknowledgements

Appendix A Proof of Proposition 1

Appendix B Proof of Lemma 2

Appendix C Proof of Lemma 4

Appendix D Proof of Proposition 6

Appendix E Justification of (25) and (26)

Appendix F The trispectrum for the real-valued case

Appendix G Proof of Theorem 7

Lemma 14**.**

Proof.

Lemma 15** (Stability of (29)).**

Proof.

Appendix H Proof of Lemma 8

Appendix I Proof of Lemma 9

Appendix J Proof of Theorem 13

Problem 1 (Multi-Reference Factor Analysis).

2 Invariant moments and identifiability of $\Sigma_{x}$

Proposition 1 (Explicit form of ${P}_{y}$ and ${T}_{y}$ ).

Lemma 2 (Solutions of the moments equations).

Proposition 3 (Fundamental ambiguities).

Problem 2 (Circulant phase retrieval).

Problem 3.

Lemma 4.

Theorem 5 (Low-rank recovery of $\hat{\Sigma}_{x}$ ).

Proposition 6 (Full-rank $\hat{\Sigma}_{x}$ ).

3 Recovering the covariance matrix $\Sigma_{x}$

3.1 Step 1: Recovering $\hat{\Sigma}_{x}$ up to diagonal phase ambiguities

Theorem 7 (Consistency of (31)).

Remark 1.

Remark 2.

Lemma 8.

Lemma 9.

Condition 10.

Proposition 11.

Proposition 12.

Theorem 13 (Consistency of Algorithm 2).

Remark 3 (Efficient evaluation of $\mathcal{V}$ and $\widetilde{\mathcal{V}}$ ).

Lemma 14.

Lemma 15 (Stability of (29)).