Rank-one Multi-Reference Factor Analysis

Yariv Aizenbud; Boris Landa; Yoel Shkolnisky

arXiv:1905.12442·math.ST·June 5, 2019

Rank-one Multi-Reference Factor Analysis

Yariv Aizenbud, Boris Landa, Yoel Shkolnisky

PDF

TL;DR

This paper develops methods for accurately estimating signals from noisy, cyclically-shifted observations in low SNR regimes, with proven consistency and improved sample complexity, supported by numerical experiments.

Contribution

It introduces a novel consistent estimation procedure for low SNR, cyclically-shifted data, and demonstrates improved sample complexity through a new algorithm.

Findings

01

Estimation is possible in low SNR regimes with a sample complexity of 1/SNR^4.

02

Proposed procedure improves sample complexity by a factor equal to the signal length.

03

Numerical experiments validate theoretical results and algorithm performance.

Abstract

In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/ SNR^{4}$ . Additionally, we propose a procedure which…

Equations559

y x = R {x} + η, = i = 1 \sum r a_{i} θ_{i},

y x = R {x} + η, = i = 1 \sum r a_{i} θ_{i},

R_{s} {θ} [ℓ] = θ [mod (ℓ - s, L)],

R_{s} {θ} [ℓ] = θ [mod (ℓ - s, L)],

y x = R_{s} {x} + η, = a θ .

y x = R_{s} {x} + η, = a θ .

SNR = \frac{E ∥ x ∥ ^{2}}{E ∥ η ∥ ^{2}} = \frac{λ}{L σ ^{2}} .

SNR = \frac{E ∥ x ∥ ^{2}}{E ∥ η ∥ ^{2}} = \frac{λ}{L σ ^{2}} .

F [ℓ, k] = \frac{1}{L} ω^{ℓ k}, ω = e^{-  2 π / L}, ℓ, k = 0, \dots, L - 1,

F [ℓ, k] = \frac{1}{L} ω^{ℓ k}, ω = e^{-  2 π / L}, ℓ, k = 0, \dots, L - 1,

\overset{y}{^} = F y, \overset{x}{^} = F x, \hat{θ} = F θ, \overset{η}{^} = F η .

\overset{y}{^} = F y, \overset{x}{^} = F x, \hat{θ} = F θ, \overset{η}{^} = F η .

\overset{y}{^} [k] \overset{x}{^} [k] = ω^{s k} \overset{x}{^} [k] + \overset{η}{^} [k], = a \hat{θ} [k],

\overset{y}{^} [k] \overset{x}{^} [k] = ω^{s k} \overset{x}{^} [k] + \overset{η}{^} [k], = a \hat{θ} [k],

θ = F^{- 1} \hat{θ} = F^{*} \hat{θ} .

θ = F^{- 1} \hat{θ} = F^{*} \hat{θ} .

p_{x} [k] = E ∣ \overset{x}{^} [k] ∣^{2} = λ ∣ \hat{θ} [k] ∣^{2}, k = 0, \dots, L - 1,

p_{x} [k] = E ∣ \overset{x}{^} [k] ∣^{2} = λ ∣ \hat{θ} [k] ∣^{2}, k = 0, \dots, L - 1,

λ = k = 0 \sum L - 1 p_{x} [k], ∣ \hat{θ} [k] ∣ = p_{x} [k] / λ .

λ = k = 0 \sum L - 1 p_{x} [k], ∣ \hat{θ} [k] ∣ = p_{x} [k] / λ .

p_{x} [k] = \frac{1}{N} i = 1 \sum N ∣ \overset{y}{^}_{i} [k] ∣^{2} - σ^{2}, λ = k = 0 \sum L - 1 ∣ p_{x} [k] ∣,

p_{x} [k] = \frac{1}{N} i = 1 \sum N ∣ \overset{y}{^}_{i} [k] ∣^{2} - σ^{2}, λ = k = 0 \sum L - 1 ∣ p_{x} [k] ∣,

p_{x} [k] = p_{x} [k] + O (\frac{σ ^{2}}{N}),

p_{x} [k] = p_{x} [k] + O (\frac{σ ^{2}}{N}),

λ = λ + O (\frac{L σ ^{2}}{N}) .

λ = λ + O (\frac{L σ ^{2}}{N}) .

u^{(m)} [k] = \hat{θ} [k] \hat{θ}^{*} [k + m], k = 0, \dots, L - 1,

u^{(m)} [k] = \hat{θ} [k] \hat{θ}^{*} [k + m], k = 0, \dots, L - 1,

arg {u^{(m)} [k]} = arg {\hat{θ} [k]} - arg {\hat{θ} [k + m]},

arg {u^{(m)} [k]} = arg {\hat{θ} [k]} - arg {\hat{θ} [k + m]},

z^{(m)} [k] = \overset{y}{^} [k] \overset{y}{^}^{*} [k + m],

z^{(m)} [k] = \overset{y}{^} [k] \overset{y}{^}^{*} [k + m],

z^{(m)} [k] = ∣ a ∣^{2} ω^{- s m} u^{(m)} [k] + ϵ^{(m)} [k],

z^{(m)} [k] = ∣ a ∣^{2} ω^{- s m} u^{(m)} [k] + ϵ^{(m)} [k],

ϵ^{(m)} [k] = a ω^{s k} \hat{θ} [k] \overset{η}{^}^{*} [k + m] + a^{*} ω^{- s (k + m)} \hat{θ}^{*} [k + m] \overset{η}{^} [k] + \overset{η}{^} [k] \overset{η}{^}^{*} [k + m] .

ϵ^{(m)} [k] = a ω^{s k} \hat{θ} [k] \overset{η}{^}^{*} [k + m] + a^{*} ω^{- s (k + m)} \hat{θ}^{*} [k + m] \overset{η}{^} [k] + \overset{η}{^} [k] \overset{η}{^}^{*} [k + m] .

C_{z}^{(m)} = E [z^{(m)} (z^{(m)})^{*}],

C_{z}^{(m)} = E [z^{(m)} (z^{(m)})^{*}],

C_{z}^{(m)} = \frac{1}{N} i = 1 \sum N z_{i}^{(m)} (z^{(m)})^{*}, m = 1, \dots, L - 1,

C_{z}^{(m)} = \frac{1}{N} i = 1 \sum N z_{i}^{(m)} (z^{(m)})^{*}, m = 1, \dots, L - 1,

C_{z}^{(m)} [k, k] \leftarrow C_{z}^{(m)} [k, k] - σ^{2} (p_{x} [k] + p_{x} [k + m]) - σ^{4},

C_{z}^{(m)} [k, k] \leftarrow C_{z}^{(m)} [k, k] - σ^{2} (p_{x} [k] + p_{x} [k + m]) - σ^{4},

k = 0 \sum L - 1 ar g {u^{(m)} [k]} = 0,

k = 0 \sum L - 1 ar g {u^{(m)} [k]} = 0,

u^{(m)} \leftarrow u^{(m)} \cdot exp {- \frac{}{L} k = 0 \sum L - 1 ar g {u^{(m)} [k]}} .

u^{(m)} \leftarrow u^{(m)} \cdot exp {- \frac{}{L} k = 0 \sum L - 1 ar g {u^{(m)} [k]}} .

u^{(m)} \leftarrow u^{(m)} \cdot e^{ 2 π j_{min} / L} .

u^{(m)} \leftarrow u^{(m)} \cdot e^{ 2 π j_{min} / L} .

u^{(1)} - e^{-  2 π s_{1} / L} \frac{u ^{(1)}}{u ^{(1)}} \leq ∥ A_{0} ∥ + σ^{2} ∥ A_{2} ∥ + σ^{3} ∥ A_{3} ∥ + σ^{4} ∥ A_{4} ∥,

u^{(1)} - e^{-  2 π s_{1} / L} \frac{u ^{(1)}}{u ^{(1)}} \leq ∥ A_{0} ∥ + σ^{2} ∥ A_{2} ∥ + σ^{3} ∥ A_{3} ∥ + σ^{4} ∥ A_{4} ∥,

∥ A_{4} ∥ \leq \frac{C _{3}}{γ _{1} δ _{1}^{2}} \frac{L ^{3}}{λ ^{4} N},

∥ A_{4} ∥ \leq \frac{C _{3}}{γ _{1} δ _{1}^{2}} \frac{L ^{3}}{λ ^{4} N},

1 - C_{1} L e^{- c_{1} L^{1/2}} - C_{2} N e^{- c_{2} N^{1/4}} .

1 - C_{1} L e^{- c_{1} L^{1/2}} - C_{2} N e^{- c_{2} N^{1/4}} .

u^{(m)} - e^{-  2 π s_{1} / L} \frac{u ^{(m)}}{u ^{(m)}}_{2} \propto \frac{1}{N L \cdot SNR ^{4}},

u^{(m)} - e^{-  2 π s_{1} / L} \frac{u ^{(m)}}{u ^{(m)}}_{2} \propto \frac{1}{N L \cdot SNR ^{4}},

N = O (\frac{1}{L \cdot SNR ^{4}}) .

N = O (\frac{1}{L \cdot SNR ^{4}}) .

arg {\hat{θ} [k + 1]} = arg {\hat{θ} [k]} - arg {u^{(1)} [k]} .

arg {\hat{θ} [k + 1]} = arg {\hat{θ} [k]} - arg {u^{(1)} [k]} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\xpatchcmd

Rank-one Multi-Reference Factor Analysis

Yariv Aizenbud1,# Boris Landa1,# Yoel Shkolnisky1 1School of Mathematical Sciences, Tel Aviv University, Israel

#These authors contributed equally

Abstract

In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/\operatorname{SNR}^{4}$ . Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal’s length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings.

1 Introduction

Due to recent improvements in acquisition, storage, and processing capabilities, there is a growing need for techniques aimed at extracting useful information from large datasets. It is commonplace to encounter large datasets of scientific observations, which are corrupted by noise and some deformation (e.g. translations, rotations, etc…) [10, 18, 26]. In many cases, in these large datasets there is a hidden low-dimensional structure which is masked by the deformations and noise.

More formally, we present the following model for the observation $y\in\mathbb{C}^{L}$ :

[TABLE]

where $\mathcal{R}$ is some random deformation operator, $\theta_{i}\in\mathbb{C}^{L},\|\theta_{i}\|=1$ for $i=1,\ldots,r$ are unknown deterministic orthonormal signals, $a_{i}\sim\mathcal{CN}(0,\lambda_{i})$ is a complex-valued random scale factor with variance $\lambda$ , and $\eta\sim\mathcal{CN}(0,\sigma^{2}I_{L})$ is a complex-valued noise vector, with $\mathcal{CN}$ being the circularly-symmetric complex normal distribution [20] (intuitively, $a\sim\mathcal{CN}(0,\lambda)$ is equivalent to $\operatorname{Im}(a)\sim\mathcal{N}(0,\lambda/2)$ and $\operatorname{Re}(a)\sim\mathcal{N}(0,\lambda/2)$ ). We assume that the noise variance $\sigma^{2}$ is known. Given observations $y_{i}$ from the model (1), our goal is to estimate the signal $\theta$ and its strength $\lambda$ .

In this work, we consider a special prototype of the model (1). First, we take the random deformation $\mathcal{R}$ to be the cyclic shift operator $R_{s}\left\{\cdot\right\}$ , that is, for any signal $\theta\in\mathbb{C}^{L}$ and any $s\in\left\{0,1,\ldots,L-1\right\}$ , we define $\mathcal{R}_{s}$ by

[TABLE]

where $s$ is drawn from the uniform distribution over $\mathbb{Z}_{L}$ (i.e., $s$ is drawn from the random variable $S$ satisfying $Pr(S=s)=1/L,\;s\in\left\{0,1,\ldots,L-1\right\}$ ). We name this estimation problem Multi-Reference Factor Analysis (MRFA), since it can be considered as factor analysis [13] under the cyclic shift $\mathcal{R}_{s}$ . In what follows, we drop the modulus by $L$ from all vector indices, as all vectors are considered as periodic. Second, we consider $x$ in (1) to be a rank one signal ( $r=1$ ). Formally, in the above notation, we consider the model:

[TABLE]

We name the problem of estimating $\theta$ and $\lambda$ from observations generated from the model (3) rank-one MRFA. Specifically, given $N$ independent observations $y_{1},y_{2},\ldots,y_{N}$ from the model (3), where $y_{i}=R_{s_{i}}\left\{a_{i}\theta\right\}+\eta_{i}$ , our goal is to recover $\theta$ and $\lambda$ . We note that the random shift $s$ in (3) is a nuisance parameter, and estimating its realizations $\left\{s_{i}\right\}_{i=1}^{N}$ is of no interest in our model. Note that in the model (3), the signal $\theta$ may be estimated only up to an arbitrary cyclic shift and a product with a complex number of modulus one (global phase). Even though our analysis is focused on the case of $a\sim\mathcal{CN}(0,\lambda)$ , all of our statements can be easily adapted to the more general setting where $a$ admits an arbitrary distribution (either real or complex) with $\mathbb{E}|a|^{2}=\lambda$ and a bounded fourth moment. The algorithms derived in this work are also suitable for solving the rank-one MRFA problem in this more general setting.

Note that if the random factor $a\sim\mathcal{CN}(0,\lambda)$ in (3) is replaced by a constant, then the model (3) reduces to Multi-Reference Alignment (MRA) [2, 8, 9, 24], which has recently drawn much interest. Both the MRA model and the model (3) provide a simplified model for various problems in science and engineering, particularly in areas such as communications, radar, image processing, and structural biology [10, 18, 25, 26, 28, 35].

Having mentioned that our problem is a generalization of MRA, it is worthwhile to discuss the latter’s possible solutions for different levels of noise. When the noise variance $\sigma^{2}$ is small, it is known that MRA can be solved by first estimating the relative shift between any two observations (taking the shift that maximizes the correlation between the two observations), then, aligning all of the observations, and finally averaging the aligned observations [9]. Such a procedure is known to result in a sample complexity (the number of samples required for a prescribed error in estimating $\theta$ ) of $N\propto\sigma^{2}$ . However, this approach fails when the noise variance $\sigma^{2}$ is large, as pairwise correlations become meaningless, and thus an alignment of the observations cannot be achieved. In the high noise regime, several algorithms were proposed in [1, 9, 12, 24], and were demonstrated to achieve the optimal asymptotic sample complexity of $N\propto\sigma^{6}$ [24] in the case of uniformly-distributed cyclic shifts, and of $N\propto\sigma^{4}$ [1] in the case of an arbitrary aperiodic distribution of the shifts. Indeed, such methods do not attempt to estimate the shifts of the observations (a task doomed to fail when the noise variance is large [24]), but rather use all observations to estimate a sufficient number of shift-invariant statistics, from which the underlying signal can be recovered. It is shown [9] that for the general case of the MRA problem, it is possible to recover the underlying signal using only the first three shift-invariant statistics: the mean (first order), the power spectrum (second order), and the bispectrum (third order) [23], where the latter governs the achieved sample complexity. The distinction between the two noise level regimes, high and low, applies analogously for samples generated by the model (3). In the low noise regime, $\theta$ in (3) can be estimated by aligning the observations using correlations, followed by calculating the rank-one factorization of the aligned samples. Therefore, when studying the model (3), the regime of interest is the one of large noise variance.

A different approach for MRA, or more generally, for estimating model parameters in the presence of nuisance parameters is Maximum-Likelihood Estimation (MLE) via Expectation-Maximization (EM) [17], which marginalizes over the nuisance parameters.

Typically, and in particular in the context of MRA, EM suffers from two major shortcomings. The first is the lack of theoretical convergence guarantees (except in some special cases [34]), and the second is the phenomenon of extremely slow convergence – resulting in very long running time, particularly when the noise variance is large [9]. In contrast, estimators based on invariant statistics typically enjoy rigorous error bounds on one hand, and faster running times on the other (as they are essentially single-pass algorithms – going over every observation only once).

Going back to our model of rank-one MRFA (3), and motivated by the above discussion, we consider the following question: Can we accurately estimate $\lambda$ and $\theta$ of (3) in the regime of large $\sigma$ and large $N$ ? We answer this question affirmatively, and moreover, propose an algorithm which, under mild conditions, is guaranteed to recover $\lambda$ and $\theta$ (up to the ambiguities of global phase and cyclic shift) when $N\rightarrow\infty$ . The asymptotic sample complexity of our proposed algorithm is $N\propto\sigma^{8}$ (for large $\sigma$ ). As a by-product, we develop new concentration results for non-i.i.d. sub-exponential random vectors (and their sample covariance matrices), when the vectors admit a certain underlying structure (see Appendix B).

While the optimal asymptotic sample complexity of MRA is $N\propto\sigma^{6}$ , the asymptotic complexity of our algorithm for MRFA is $N\propto\sigma^{8}$ . The reason for the different rates is that the rank-one MRFA is fundamentally different from MRA, as the third-order invariant statistic, the bispectrum (given by certain triplet correlations in the Fourier domain – see Section 3), which is used to solve MRA, vanishes under the setting of (3). Therefore, we resort to a fourth-order shift-invariant statistic, known as the trispectrum (given by certain quadruplet correlations in the Fourier domain), and hence the dependence on $\sigma^{8}$ instead of $\sigma^{6}$ . Since the bispectrum vanishes entirely, we believe it is unlikely that a rate better than $\sigma^{8}$ can be achieved (see [3] for this type of argument in the case of MRA).

Note that the previously-mentioned sample complexities are oblivious to the signal’s length $L$ , as they assume $L$ is a fixed constant. In practice, the length of the signal affects the sample complexity, and thus, we do not neglect it in our analysis. We analyze the sample complexity of our algorithms in terms of the signal’s length $L$ , and in terms of the signal-to-noise ratio (SNR), defined as

[TABLE]

We present in this work two algorithm that solve the rank-one MRFA problem. We show that the sample complexity of the first algorithm satisfies $N(L,\operatorname{SNR})=\mathcal{O}(L/\operatorname{SNR}^{4})$ (in the regime of large $\sigma$ ), where $N$ is the number of required measurements for estimating $\theta$ and $\lambda$ to a given accuracy. We observed numerically that the achieved sample complexity of this algorithm is actually better by a factor of $L$ , satisfying $N(L,\operatorname{SNR})\propto 1/\operatorname{SNR}^{4}$ . The second algorithm is a variant of the first one, which works better in practice at the cost of weaker theoretical bounds. It is observed to provide a further improvement by a factor of $L$ , resulting in a sample complexity of $N\propto 1/(L\cdot\operatorname{SNR}^{4})$ .

We demonstrate all our algorithms numerically, and show the agreement between the theory and the experimental results. We also compare our algorithms with the EM algorithm (designed for the rank-one MRFA problem) and show that it provides the same sample complexity as our methods (as a function of $\sigma$ ) with a marginal gain in the estimation error, while suffering from extremely slow running times and lack of theoretical guarantees.

The paper is organized as follows. Section 2 describes our algorithms for recovering $\lambda$ and $\theta$ of (3), and provides the main theoretical guarantees for their performance. Section 3 connects our approach with that of shift-invariant statistics. Section 4 presents the numerical experiments supporting our theoretical derivations. Finally, Section 5 provides some concluding remarks and some possible future research directions.

2 Method description and main results

In this section, we describe our methods for estimating the model parameters $\lambda$ and $\theta$ of (3), and provide their theoretical error bounds and sample complexities.

Instead of working with the model (3) directly, we consider an equivalent, more convenient formulation in the Fourier domain, where cyclic shifts are replaced by modulations. Let $F\in\mathbb{C}^{L\times L}$ be the unitary Discrete Fourier Transform (DFT) matrix

[TABLE]

and denote the Fourier transforms of the quantities in (3) by

[TABLE]

Then, a formulation equivalent to (3) in the Fourier domain is

[TABLE]

for $k=0,\ldots,L-1$ , where $\hat{\eta}\sim\mathcal{CN}(0,\sigma^{2}I_{L})$ and $\|\hat{\theta}\|=1$ (since $F$ is unitary). In what follows, we consider the problem of estimating $\lambda$ and $\hat{\theta}$ from the observations $\hat{y}_{1},\ldots,\hat{y}_{N}$ ( $\hat{y}_{i}=Fy_{i}$ ), recalling that $\theta$ can always be obtained from $\hat{\theta}$ by the inverse Fourier transform, i.e.

[TABLE]

In the rest of this section, we describe our methods for estimating the signal parameters $\hat{\theta}$ and $\lambda$ . We start by describing how to estimate the signal’s strength $\lambda$ together with the magnitudes of $\hat{\theta}$ (i.e. $|\hat{\theta}|$ ) from the power spectrum of the observations. Then, we detail two methods for estimating the phases of $\hat{\theta}$ (i.e. $\operatorname{arg}\{\hat{\theta}\}$ ). The first method is simpler to implement and enjoys faster running times. The second method is able to exploit more information from the statistics calculated from the observations and results in lower estimation errors. We show that both methods are statistically consistent in estimating $\lambda$ and $\theta$ (or, equivalently, $\hat{\theta}$ ) as $N\rightarrow\infty$ , up to the inherent ambiguities for $\theta$ – global phase and cyclic shift. We also show that in the low SNR regime (large $\sigma$ ), the first method admits a sample complexity of $N=\mathcal{O}(L/\operatorname{SNR}^{4})$ . Later, in Section 4, we demonstrate numerically that the first method actually achieves a better sample complexity of $N=\mathcal{O}(1/\operatorname{SNR}^{4})$ . The second method further improves upon this rate by a factor of $L$ , with $N=\mathcal{O}(1/(L\cdot\operatorname{SNR}^{4}))$ .

2.1 Estimating $\lambda$ and the magnitudes of $\hat{\theta}$

Consider the power spectrum of the random signal $x$ from (3), given by

[TABLE]

and note that since $\|\hat{\theta}\|=1$ , $p_{x}$ encodes both $\lambda$ and the magnitudes of $\hat{\theta}$ (i.e. $|\hat{\theta}|$ ) via

[TABLE]

The power spectrum $p_{x}$ , and correspondingly $\lambda$ , can be estimated from $\left\{\hat{y}_{i}\right\}_{i=1}^{N}$ by

[TABLE]

which are consistent estimators (as $N\rightarrow\infty$ ) for $p_{x}$ and $\lambda$ , respectively. Furthermore, $\widetilde{p}_{x}$ satisfies [33]

[TABLE]

where we discarded lower-order terms of $\sigma$ (since we are interested in the regime of $\sigma\to\infty$ ), and therefore

[TABLE]

Hence, the sample complexity of estimating $\lambda$ and the magnitudes of $\hat{\theta}$ from $\widetilde{\lambda}$ and $\widetilde{p}_{x}$ is $N=\mathcal{O}(1/\operatorname{SNR}^{2})$ in the regime of large noise variance.

2.2 Estimating the phases of $\hat{\theta}$

Next, we proceed to estimate the phases of the signal $\hat{\theta}$ , that is, the vector $\operatorname{arg}\left\{\hat{\theta}\right\}$ . Consider the vectors $u^{(m)}\in\mathbb{C}^{L}$ , for $m=0,\ldots,L-1$ , defined by

[TABLE]

were $\hat{\theta}^{*}$ is the complex conjugate of $\hat{\theta}$ . Essentially, each vector $u^{(m)}$ consists of the products between the elements of $\hat{\theta}$ with stride $m$ , and thus encodes the phases of $\hat{\theta}$ through the relation

[TABLE]

up to an integer multiple of $2\pi$ . That is, the vectors $u^{(m)}$ , $m=0,\ldots,L-1$ , describe all pairwise differences between the phases of the elements of $\hat{\theta}$ . We mention that $u^{(0)}$ does not provide any useful phase information (as it is equivalent to the power spectrum), and will play no role in what follows.

Before we describe how to extract the phases of $\hat{\theta}$ from the vectors $u^{(m)}$ , we present a method for estimating $u^{(m)}$ from the observations $\hat{y}_{1},\ldots,\hat{y}_{N}$ . We define the stride- $m$ products of the elements of $\hat{y}$ of (7) by

[TABLE]

and observe that by (7) we have

[TABLE]

where $\epsilon^{(m)}$ is a noise term given by

[TABLE]

Note that if we had no noise, i.e. $\sigma=0$ (and hence $\epsilon^{(m)}=0$ ), then different realizations of $z^{(m)}$ would be equal to $u^{(m)}$ up to constant factors (as $|a|^{2}\omega^{-sm}$ is independent of the frequency index $k$ ). We define the covariance matrix of $z^{(m)}$ as

[TABLE]

and observe that in the case of no noise ( $\sigma=0$ ), $C_{z}^{(m)}$ would be of rank one, with its leading eigenvector equal to $u^{(m)}$ (again – up to a constant factor of known magnitude). We therefore proceed by computing the sample covariance matrices of $z^{(m)}$

[TABLE]

and correct for the effect of the noise (see Appendix A) by modifying the main diagonal of each $\widetilde{C}_{z}^{(m)}$ according to

[TABLE]

where $\widetilde{p}_{x}$ is the estimate of the power spectrum of $x$ from (11). Essentially, (21) corrects for the bias in $\widetilde{C}_{z}^{(m)}$ caused by the noise term $\epsilon^{(m)}$ . In Section 3 below, we relate the sequence of matrices $\{C_{z}^{(m)}\}$ , $m=0\ldots L-1$ , of (19) with the trispectrum of $y$ (for the definition of the trispectrum see also Section 3).

Once $\widetilde{C}_{z}^{(m)}$ has been computed (for any $m$ ), we take its leading normalized eigenvector $\widetilde{u}^{(m)}$ (corresponding to the largest eigenvalue of $\widetilde{C}_{z}^{(m)}$ ) as an estimate for $u^{(m)}$ . Note that the subtraction of $\sigma^{4}$ in (21) has no effect on the eigenvectors of $\widetilde{C}_{z}^{(m)}$ , and therefore has no effect on our estimate for $u^{(m)}$ .

Note that by the definition of $u^{(m)}$ (see also (15)), it satisfies

[TABLE]

whereas $\widetilde{u}^{(m)}$ does not necessarily satisfy (22), since $\widetilde{u}^{(m)}$ estimates $u^{(m)}$ up to an arbitrary phase. We can easily update $\widetilde{u}^{(m)}$ to satisfy $\sum_{k=0}^{L-1}\arg\{\widetilde{u}^{(m)}[k]\}=0$ by

[TABLE]

Note that after the update (23), $\widetilde{u}^{(m)}$ is unique up to a multiplication by $e^{\imath 2\pi j/L}$ for $j\in\{0,\ldots,L-1\}$ . One approach to try and solve this ambiguity is to multiply $\widetilde{u}^{(m)}$ by a phase, such that $\arg\{\widetilde{u}^{(m)}[0]\}=0$ . Unfortunately, this will violate requirement $\sum_{k=0}^{L-1}\arg\{\widetilde{u}^{(m)}[k]\}=0$ . We fix this ambiguity while satisfying (22) by choosing $j_{min}$ such that $\operatorname{arg}\{\widetilde{u}^{(m)}[0]e^{\imath 2\pi j_{min}/L}\}$ is minimal, and then we update $\widetilde{u}^{(m)}$ again by

[TABLE]

Next, we characterize the accuracy of the estimate $\widetilde{u}^{(m)}$ in the low SNR regime. For the sake of clarity and simplicity of notation in the proof, the result is stated in Lemma 2.1 and Theorem 2.2 for the case of $m=1$ .

Lemma 2.1.

Let $u^{(1)}$ be given by (14), let $\widetilde{u}^{(1)}$ be the leading eigenvector of $\widetilde{C}_{z}^{(1)}$ of (21), and define $\gamma_{1}=L\left\|u^{(1)}\right\|^{2}$ and $\delta_{1}=\min_{k\in\{0,\ldots,L-1\}}|u^{(1)}[k]|$ . If $\gamma_{1},\delta_{1}>0$ and $N>2L$ is large enough, then the error in the estimate $\widetilde{u}^{(1)}$ of $\frac{u^{(1)}}{\left\|u^{(1)}\right\|}$ can be bounded by

[TABLE]

where $A_{0},A_{2},A_{3}$ and $A_{4}$ are independent of $\sigma$ , for some $s_{1}\in\{0,\ldots,L-1\}$ .

The proof is provided in Appendix B.3.

Theorem 2.2.

Let $u^{(1)}$ be given by (14), let $\widetilde{u}^{(1)}$ be the leading eigenvector of $\widetilde{C}_{z}^{(1)}$ of (21), and define $\gamma_{1}=L\left\|u^{(1)}\right\|^{2}$ and $\delta_{1}=\min_{k\in\{0,\ldots,L-1\}}|u^{(1)}[k]|$ . Then, if $\gamma_{1},\delta_{1}>0$ and $N>2L$ is large enough, there exist constants $C_{1},C_{2},C_{3},c_{1},c_{2}>0$ (independent of $u^{(1)},N,L$ and $\sigma$ ), such that for $A_{4}$ from Lemma 2.1

[TABLE]

with probability at least

[TABLE]

The proof is provided in Appendix B.4. Note that the probability in (26) tends to $1$ rapidly as $N,L\rightarrow\infty$ (dominated by an exponential rate).

In essence, Lemma 2.1 states that the accuracy of the estimate $\widetilde{u}^{(m)}$ can be bounded by a degree-4 polynomial in $\sigma$ . In the large noise regime ( $\sigma\gg 1$ ), the dominant part of the error in the estimate of $\widetilde{u}^{(m)}$ is $\|A_{4}\|$ , and thus we have Theorem 2.2 that bounds $\|A_{4}\|$ with high probability. We argue in Appendix B.5 that with high probability, in the regime of large $N$ and large $\sigma$

[TABLE]

where SNR is defined in (4) and $\gamma_{m}$ is defined analogously to $\gamma_{1}$ by $\gamma_{m}=L\left\|u^{(m)}\right\|^{2}$ . In other words, in the regime of low SNR, the number of samples $N$ required to estimate $u^{(m)}$ by $\widetilde{u}^{(m)}$ to a constant prescribed error is

[TABLE]

Next, we show how to estimate the phases of $\hat{\theta}$ using the estimates $\widetilde{u}^{(m)}$ .

2.2.1 Phase estimation by frequency marching

Note that $u^{(1)}$ of (14) is sufficient to recover the phases of $\hat{\theta}$ , since (up to an integer multiple of $2\pi$ )

[TABLE]

Therefore, the phases of $\hat{\theta}$ can be obtained by first initializing the phase of $\hat{\theta}[0]$ to zero (recalling that we have an ambiguity of a global phase and a cyclic shift, so we can initialize the phase of $\hat{\theta}[0]$ arbitrarily), followed by applying (29) repeatedly for $k=0,\ldots,L-2$ . We refer to this procedure as frequency marching (FM) (see [9] for a similar technique baring the same name). Using $\widetilde{u}^{(1)}$ (the leading eigenvector of $\widetilde{C}^{(1)}_{z}$ of (21)) as an estimate of $u^{(1)}$ , the explicit formula for the estimated signal $\widetilde{\theta}$ (combining the amplitude given by (10) with the phases given by (29)) is given by

[TABLE]

where $\odot$ denotes element-wise multiplication (the Hadamard product), $\widetilde{\theta}_{m}$ is the magnitude part (given by (10))

[TABLE]

and $\widetilde{\theta}_{p}$ is the phase part (given by (29))

[TABLE]

The algorithm for estimating the signal parameters $\lambda$ and $\theta$ using frequency marching is summarized in Algorithm 1.

Algorithm 1 clearly does not exploit all the available information. While the method presented in the next section exploits more complicated statistics of the observations and provides lower estimation errors, Algorithm 1 is simpler to implement and enjoys faster running times (then the algorithm of the next section). For Algorithm 1 (unlike the method of the next section) we show that in the low SNR regime it admits a sample complexity of $N=\mathcal{O}(L/\operatorname{SNR}^{4})$ .

The next theorem establishes that Algorithm 1 results in a consistent estimate for $\theta$ as $N\rightarrow\infty$ , up to the ambiguities of a phase factor and a cyclic shift.

Theorem 2.3 (Consistency of Algorithm 1).

Let $\theta$ be from the model (3) and suppose that $|\hat{\theta}[k]|>0$ for all $k=0,\ldots,L-1$ . Let $\widetilde{\theta}$ be from Algorithm 1. Then, there exists an integer ${s}\in\mathbb{Z}_{L}$ and a constant $\alpha\in\mathbb{C}$ satisfying $\left|\alpha\right|=1$ , such that

[TABLE]

where $\mathcal{R}_{{s}}\left\{\cdot\right\}$ is a cyclic shift by ${s}$ .

The proof is provided in Appendix C.

Additionally, we have the following theorem, which characterizes the asymptotic estimation error of Algorithm 1 in the low SNR regime.

Theorem 2.4 (Asymptotic error of Algorithm 1).

Let $\theta$ be from the model (3) and $\widetilde{\theta}$ be from Algorithm 1. Define $\gamma_{1}=L\left\|u^{(1)}\right\|^{2}$ and $\delta_{1}=\min_{k\in\{0,\ldots,L-1\}}|u^{(1)}[k]|$ where $u^{(1)}$ is given by (14). If $\delta_{1}>0$ and $N>2L$ is large enough, there exists an integer $s\in\mathbb{Z}_{L}$ , and a constant $\alpha\in\mathbb{C}$ satisfying $\left|\alpha\right|=1$ , such that

[TABLE]

where $b_{0},b_{2},b_{3},b_{4}$ are independent of $\sigma$ . Additionally, there exist constants $C_{1},C_{2},C_{4},c_{1},c_{2}>0$ (independent of $\theta,N,L$ and $\sigma$ ), such that

[TABLE]

with probability at least

[TABLE]

The proof is provided in Appendix D. By bounding $b_{0},\leavevmode\nobreak\ b_{2}$ , and $b_{3}$ in (34) along the lines of Appendix B.5, we get from Theorem 2.4 that for large $N$ and $\sigma$ , with high probability we have

[TABLE]

or, that in the regime of low SNR, the number of samples $N$ required by Algorithm 1 to estimate $\theta$ to a constant prescribed error (the sample complexity) is

[TABLE]

In Section 4, we demonstrate numerically that the sample complexity achieved by Algorithm 1 is actually better than (38) by a factor of $L$ , i.e. $N\propto 1/\operatorname{SNR}^{4}$ .

2.2.2 Phase estimation by alternating minimization

Next, we propose a method for estimating the phases of $\hat{\theta}$ using all vectors $\widetilde{u}^{(m)}$ , $m=1,\ldots,L-1$ , simultaneously. To that end, the main observation is that the vector $u^{(m)}$ of (14) is the $m$ ’th diagonal of $\hat{\theta}\hat{\theta^{*}}$ . Therefore, we can use all vectors $u^{(m)}$ to construct $\hat{\theta}\hat{\theta^{*}}$ , followed by estimating $\hat{\theta}$ from the leading eigenvector of $\hat{\theta}\hat{\theta^{*}}$ . In practice, as we only have access to the estimated vectors $\widetilde{u}^{(m)}$ , which estimate ${u}^{(m)}$ up to unknown phase factors, we can only approximate the matrix $\hat{\theta}\hat{\theta}^{*}$ up to an element-wise product with a circulant matrix (of the unknown phases). Therefore, we propose to estimate $\hat{\theta}$ and the unknown phase factors in $\widetilde{u}^{(m)}$ simultaneously. As we try to obtain the phases of $\hat{\theta}$ , we will use only the phases of $\widetilde{u}^{(m)}$ throughout this procedure.

We start by forming the $L\times L$ matrix $\widetilde{C}_{x}$ as

[TABLE]

That is, $\widetilde{C}_{x}$ has ones on its main diagonal, and the phases of the vector $\widetilde{u}^{(m)}$ on its $m$ ’th diagonal (with circulant wrapping for each diagonal). Note that by the structure of $\widetilde{C}_{x}$ in (39) and by the definition of $\widetilde{u}^{(m)}$ , the matrix $\widetilde{C}_{x}$ is self-adjoint. Thus, if we have no noise, i.e $\sigma=0$ , it follows that $\widetilde{C}_{x}$ can be written as the element-wise product of a rank-one matrix, and a circulant matrix of phases (since $\widetilde{u}^{(m)}$ estimates ${u}^{(m)}$ up to an unknown constant phase factor). Specifically, if $\sigma=0$ we have

[TABLE]

where $\hat{\theta}_{p}\in\mathbb{C}^{L}$ is the phase part of $\hat{\theta}$ , i.e. $\hat{\theta}_{p}[k]=\hat{\theta}[k]/\left|\hat{\theta}[k]\right|$ , $\odot$ denotes element-wise multiplication (the Hadamard product), and $\operatorname{Circul}\left\{\alpha\right\}$ is a circulant matrix constructed from a vector of phases $\alpha\in\mathbb{C}^{L}$ , i.e.

[TABLE]

Motivated by the observation in (40) for the case of no noise ( $\sigma=0$ ), we propose to recover $q$ by solving

[TABLE]

Now, since (42) is non-convex, we propose to solve it by alternating minimization, where we alternate between solving for $q$ and solving for $\alpha$ . When $\alpha$ is held fixed, the solution for $q$ is given simply by the singular value decomposition (SVD) of the matrix $\widetilde{C}_{x}\odot\operatorname{Circul}\left\{\alpha^{*}\right\}$ . That is,

[TABLE]

where $s_{1}$ is the largest singular value of $\widetilde{C}_{x}\odot\operatorname{Circul}\left\{\alpha^{*}\right\}$ , and $v_{1}$ is its corresponding singular vector (left and right singular vectors are equal up to a sign since the matrix $\widetilde{C}_{x}\odot\operatorname{Circul}\left\{\alpha^{*}\right\}$ is self-adjoint). When $q$ is held fixed, the optimization problem for $\alpha$ reduces to a least-squares problem with absolute-value constraints, whose solution is given explicitly by

[TABLE]

where all matrix and vector index assignments in (44) are modulo $L$ . We alternate between computing the solutions for $\alpha$ and for $q$ iteratively until the objective in (42) reaches saturation. We summarize the algorithm for estimating $\theta$ and $\lambda$ of (3) using alternating minimization in Algorithm 2.

The following theorem states that even a single iteration of the alternating minimization in Algorithm 2 lines 15-18 (i.e. $\tau=1$ ) results in a consistent estimate for $\theta$ as $N\rightarrow\infty$ (up to the inherent ambiguities of phase and cyclic shift).

Theorem 2.5 (Consistency of Algorithm 2).

Let $\theta$ be from the model (3), $\widetilde{\theta}$ be from Algorithm 2 with $\tau=1$ , and suppose that $|\hat{\theta}[k]|>0$ for all $k=0,\ldots,L-1$ . Then, there exists an integer ${s}\in\mathbb{Z}^{L}$ and a constant $\alpha\in\mathbb{C}$ satisfying $\left|\alpha\right|=1$ , such that

[TABLE]

where $\mathcal{R}_{{s}}\left\{\cdot\right\}$ is a cyclic shift by ${s}$ .

The proof is provided in Appendix E.

In terms of sample complexity, since Algorithm 2 estimates the phases of $\hat{\theta}$ using all vectors $\widetilde{u}^{(1)},\ldots,\widetilde{u}^{(L-1)}$ simultaneously, we expect it to perform better than Algorithm 1, which only uses $\widetilde{u}^{(1)}$ . In particular, the alternating minimization in Algorithm 2 estimates $2L$ unknown parameters ( $L$ parameters for $q$ , and $L$ for the nuisance phases $\alpha$ ) from $O(L^{2})$ measurements (the matrix $\widetilde{C}_{x}$ ), whereas Algorithm 1 estimates $L$ parameters (the phases of $\hat{\theta}$ ) from only $L$ measurements (the vector $u^{(1)}$ ). Even though the error vectors $\epsilon^{(m)}$ (of (18)) are not independent for different $m$ , it is easy to verify that they are uncorrelated, and therefore, it is reasonable to expect that the error in estimating $\theta$ by Algorithm 2 would improve upon that of Algorithm 1 (since we use more information). Indeed, we demonstrate numerically in Section 4 that estimating $\theta$ by Algorithm 2 achieves the sample complexity of (27), i.e.,

[TABLE]

which improves upon the sample complexity of Algorithm 1 by a factor of $L$ .

3 Invariant statistics and Trispectrum inversion

While the previous section provides a self-contained derivation of our algorithms, there are several advantages to present a more systematic approach to the MRFA problem. First, it is worthwhile to relate our algorithms with the method of moments (see [33], and the generalized method of moments [21]) used in previous works (see in particular works related to MRA [8, 9, 24]). Second, this section also explains the methodology behind the derivation of our algorithms, and some of the fundamental limitations related to the sample complexity of these algorithms. Third, the presentation in this section establishes that the presented algorithms can be interpreted as methods for recovering a signal from its second and forth moments (the trispectrum in particular). The trispectrum is currently used, for example, in cosmology [22, 11] and signal processing [14], but currently there is no known algorithm for its robust inversion.

In what follows, we introduce the concept of shift-invariant statistics, and show its connection to the approach of Section 2 by demonstrating that both of our algorithms for recovering the phases of the unknown signal makes use of the first two non-vanishing invariant statistic.

Consider the moments of the $L$ dimensional random vector $\hat{y}$ up to order four

[TABLE]

It is easy to verify that for $y$ satisfying the model (3), the following holds:

The first moment $M_{1}[k]$ is equal to zero for any $k$ . In particular, the mean of the samples of $y$ satisfies

[TABLE] 2. 2.

The second moment $M_{2}[k_{1},k_{2}]$ is equal to zero for $k_{1}\neq k_{2}$ . For $k_{1}=k_{2}$ , the second moment is just the power spectrum of $y$ , that is

[TABLE] 3. 3.

The third moment $M_{3}[k_{1},k_{2},k_{3}]$ is equal to zero for any $k_{1},k_{2}$ and $k_{3}$ . In particular, the bispectrum of $y$ vanishes,

[TABLE]

The bispectrum $B_{y}$ is a third moment of $\hat{y}$ (the expected value of the product of $\hat{y}$ at three different indices), and it vanishes since $\hat{y}[k]$ admits a symmetric distribution around [math], owing to the random factor $a\sim\mathcal{CN}(0,\lambda)$ . Specifically, the factor $a$ in the bispectrum is taken to the third power, which results in an expected value equal to zero. 4. 4.

The forth moment $M_{4}[k_{1},k_{2},k_{3},k_{4}]$ is equal to zero for any $k_{1}-k_{2}+k_{3}-k_{4}\neq 0$ . The value of $M_{4}$ for the indices satisfying $k_{1}-k_{2}+k_{3}-k_{4}=0$ is known as the trispectrum [14],

[TABLE]

It is easy to verify that the entries of $\mu_{y}$ , $p_{y}$ , $B_{y}$ and $T_{y}$ are invariant to cyclic shifts of $y$ (or equivalently, to modulations of $\hat{y}$ ), regardless of the distribution of the shifts, and hence serve as shift-invariant statistics of $y$ . When using the model (7) to compute (52), it follows that

[TABLE]

Since $\mu_{y}$ and $B_{y}$ vanish, among the first three invariant statistics $\mu_{y}$ , $p_{y}$ and $B_{y}$ , only the power spectrum $p_{y}$ provides us with useful information, which is used to estimate $\lambda$ and the signal magnitudes $|\hat{\theta}[k]|$ as described in Section 2.1.

While the magnitudes of $\hat{\theta}$ can be estimated from the power spectrum (55), the latter is insufficient to determine $\hat{\theta}$ entirely, as we also need to estimate the phases of $\hat{\theta}$ . Therefore, we need to use the fourth-order moment, and particularly, the trispectrum of $y$ . Thus, we next point out the relation between the trispectrum $T_{y}$ and the methods described in Section 2. Consider the random vector $z\in\mathbb{C}^{L^{2}}$ and its covariance matrix $C_{z}\in\mathbb{C}^{L^{2}\times L^{2}}$ , given by

[TABLE]

where $z^{(m)}\in\mathbb{C}^{L},\;m=0,\ldots,L-1,$ are from (16). Note that $z$ can also be viewed as the elements of the matrix $yy^{*}$ reorganized into a vector (i.e. all products of the elements in $y$ ). Then, the covariance matrix $C_{z}$ can be expressed as

[TABLE]

where $(m_{1},m_{2})$ enumerates over blocks of size $L\times L$ in $C_{z}$ . It follows that $C_{z}$ is related to the fourth moment $M_{4}$ of (50) via

[TABLE]

Therefore, the matrix $C_{z}$ encodes the same information as $M_{4}$ . Note that

[TABLE]

and thus, $C_{z}$ is a block-diagonal matrix with its non-zero blocks given by

[TABLE]

for $m=0,\ldots,L-1$ (see (19)), thereby establishing that the sequence of matrices $\left\{C_{z}^{(m)}\right\}_{m=0}^{L-1}$ , which are estimated in Step 9 of Algorithm 2 ( $C_{z}^{(1)}$ is also estimated in Step 7 of Algorithm 1) is equivalent to the trispectrum $T_{y}$ . This relation between the trispectrum and $C_{z}^{(m)}$ shows that Algorithms 1 and 2, starting from Steps 9 and 13 respectively, are, in fact, estimating the signal from its trispectrum, or, in other words, Algorithms 1 and 2 preform trispectrum inversion.

In [3] it is shown that the sample complexity of any algorithm that recovers the underlying signal from its noisy measurements on a group action channel (in our case the group action is shifting the signal) is bounded from below by $\mathcal{O}(\mbox{SNR}^{-d})$ , where $d$ is the smallest integer such that all the moments up to order $d$ define the signal uniquely (up to the ambiguities of the problem). For the case of the MRA problem this moment is the bispectrum ( $d=3$ ). For the rank-one MRFA problem, the trispectrum is the first shift-invariant moment carrying sufficient information for recovering $\hat{\theta}$ (and thus $\theta$ ), and therefore in this case $d=4$ .

4 Numerical Examples

In this section, we demonstrate the performance of Algorithms 1 and 2 (see Section 2) by numerical simulations, and show that their performance agrees with the theoretical results. The error in all the experiments is measured as

[TABLE]

We also compare our algorithms with the Expectation-Maximization (EM) algorithm [17], which is a popular approach for Maximum-Likelihood Estimation (MLE) in the presence of nuisance parameters. We start this section with a short explanation of the EM algorithm adapted to our model (3).

4.1 EM algorithm for the MRFA problem

The EM algorithm is a classical heuristic approach that tries to optimize model parameters by maximizing the likelihood of the observations given the model parameters. This approach is widely used in many applications [27, 29]. The EM algorithm iterates over two steps: The Expectation step (E-step) and the Maximization step (M-step). An elaborated description of the EM algorithm for the MRA problem can be found in [9]. In the case of the rank-one MRFA (3), the EM algorithm is initialized with parameter estimates $\theta_{0}$ and $\lambda_{0}$ , and then iterates the E-step and M-step given below until convergence:

E-Step:

At iteration $k$ , given current parameter estimates $\theta_{k}$ and $\lambda_{k}$ , estimate the likelihood of the observation $y_{j}$ assuming shift $s$ . Under the model in (3), the likelihood is

[TABLE]

M-Step:

Find $\theta_{k+1}$ and $\lambda_{k+1}$ that minimize the expression

[TABLE]

4.2 Experimental results

We first demonstrate the performance of Algorithms 1 and 2 and the EM algorithm for random complex signals of length 16 ( $L=16$ ), for a number of samples ( $N$ ) between $10^{1}$ and $10^{5}$ , and for SNR (as defined in (4)) values between $10^{0}$ and $10^{-2}$ . The signals are generated as follows. First, we generate a random complex Gaussian signal of length $L$ , and normalize its power spectrum to be a vector of ones. The power spectrum normalization ensures that $\gamma_{1}$ and $\delta_{1}$ (as defined in Theorem 2.4) are the same in all experiments. Next, the observations are generated from the clean signal by multiplying it by a normally distributed complex factor with variance $1$ , then shifting it by a uniformly distributed shift and adding to the resulting vector Gaussian noise with variance $\sigma^{2}$ (derived from the SNR).

The results of the different algorithms are shown in Figure 1, where the intensity of each pixel describes the accuracy of the considered algorithm for the corresponding $N$ and SNR (blue/dark is high accuracy and yellow/light is low accuracy). In Figures 1(a) and 1(b) we demonstrate the performance of Algorithms 1 and 2 respectively. In Figures 1(c) and 1(d) we demonstrate the performance of the EM algorithm, where in Figure 1(c) the EM algorithm is initialized with the correct signal (“Oracle initialization”) and in Figure 1(d) the EM algorithm is initialized with a random vector. Although Figures 1(a) and 1(b) look similar, Algorithm 2 “breaks” at lower SNRs for any given sample size (it is mostly visible when comparing Figures 1(a) and 1(b) in the region where $\log_{10}N=5$ and $\log_{10}\mbox{SNR}=-1.8$ ).

We have shown in (38) that $N$ should asymptotically be proportional to $1/\mbox{SNR}^{4}$ , or, on a logarithmic scale, $\log N$ should depend linearly on $\log\mbox{SNR}$ . It can be noticed that the phase transition in Figures 1(a) and 1(b) is indeed linear. In Figure 2, we show the same heat-map as in Figure 1(b) together with a black line that is the solution to the equation $N=(1/4)L\leavevmode\nobreak\ \mbox{SNR}^{4}$ . Notice how the transition region is aligned with the solid black line.

We show the performance of Algorithm 1, Algorithm 2 and the EM algorithm with oracle/ random initialization for $N=10^{5}$ and $L=16$ for different SNR values in Figure 3. Although the accuracy of the EM algorithm is higher for high SNRs, for low SNRs (starting from around $0.17$ ) the EM algorithm gives less accurate results. Additionally, as opposed to the constant running time of Algorithms 1 and 2, the EM algorithm takes much longer for low SNR values.

We show next, in Figure 4, the performance of Algorithm 1, Algorithm 2 and the EM algorithm with oracle/ random initialization for $\text{SNR}=0.108$ and $L=16$ for different values of $N$ . It can be noticed from Figure 4(b) that Algorithm 2 is slower than Algorithm 1 by approximately a constant time for all $N$ . The computationally expensive step of Algorithm 2 is the “alternating minimization” step, whose complexity does not depend on $N$ .

Finally, we show in Figure 5 a heat-map of the accuracy of Algorithm 2 for $N=4000$ , with SNR values between $0.5$ and $0.01$ (on the $y$ -axis, on a log scale) and for values of $L$ between $16$ and $256$ (on the $x$ -axis, on a log scale). It can be noticed that, as expected, the dependency between $\log L$ and $\log$ SNR is linear.

5 Summary and discussion

We presented two statistically consistent algorithms for solving the rank-one MRFA problem. One algorithm (Algorithm 1) has proven non-asymptotic performance bounds, and the other (Algorithm 2) has better performance in practice. We compared the performance of the two algorithms to the EM algorithm, and showed the superiority of Algorithms 1 and 2 in “difficult” regimes. Intuitively, Algorithm 2 uses more information than Algorithm 1, by taking advantage of the entire trispectrum, and thus gives more accurate results. We also note that even though our model (3) considers the case of uniform distribution for the shifts (which is more challenging in terms of estimation accuracy [1]), the algorithms presented in this work can handle any distribution.

Although Algorithm 2 uses more information than Algorithm 1, we did not show that Algorithm 2 is optimal. Thus, there might be a better way for solving the MRFA problem and, equivalently, for inverting the trispectrum.

As of future research, an interesting direction is to extend the rank-one MRFA model to a general rank- $r$ MRFA model.

Appendix A Covariance matrix of $z^{(m)}$

In this section, we show that $C_{z}^{(m)}$ of (19) is given by

[TABLE]

where $\Sigma^{(m)}_{\epsilon}$ is a diagonal bias term given by

[TABLE]

for any $m\neq 0$ , where $p_{x}$ is the power spectrum of $x$ (see (9)). Thus, we establish that the matrix $C_{z}^{(m)}-\Sigma^{(m)}_{\epsilon}$ is of rank one, with its leading eigenvector equal to $u^{(m)}$ up to a constant factor.

Recall that (see (17))

[TABLE]

where $\epsilon^{(m)}$ is given by (see (18))

[TABLE]

and $\hat{\eta}$ is defined in (7). From (19), (66), (67) we get that

[TABLE]

Since $a\sim\mathcal{CN}(0,\lambda)$ is complex-valued, we have that $\mathbb{E}\left[|a|^{4}\right]=2\lambda^{2}$ . Additionally, when substituting (67) in (68), together with the fact that $\mathbb{E}\left[\hat{\eta}[k]\right]=0$ , it follows that for $m\neq 0$

[TABLE]

for every $k_{1}$ and $k_{2}$ , since $\hat{\eta}[k]$ and $\hat{\eta}[k+m]$ are uncorrelated for every $k$ (and $m\neq 0$ ), and independent of the factor $a$ . Next, we have

[TABLE]

Further manipulation of (71) using the properties of the model (7) results in

[TABLE]

Lastly, formula (9) for the power spectrum $p_{x}$ gives (65).

Appendix B Error bound for $\widetilde{u}^{(1)}$

B.1 Preliminary results

We first recall the definition of a sub-exponential random variable (see [30, 31, 32]).

Definition 1.

A random variable $X$ is called sub-exponential if there exists a constant $\alpha>0$ such that

[TABLE]

for all $t\geq 0$ .

Lemma B.1.

Let $X_{ik}$ , $i=1,\ldots,N$ and $k=0,\ldots,L-1$ , be i.i.d. sub-exponential random variables with zero mean. Then, there exist constants $C^{{}^{\prime}},c^{{}^{\prime}}>0$ such that

[TABLE]

with probability at least

[TABLE]

Proof.

Fixing $k\in\left\{0,\ldots,L-1\right\}$ , $X_{1,k},\ldots,X_{N,k}$ are i.i.d. sub-exponential random variables, and by Proposition 5.16 in [30], there exist constants $C^{{}^{\prime}},c^{{}^{\prime}}>0$ such that

[TABLE]

Therefore, by the union bound we have that

[TABLE]

Finally, substituting $t=\sqrt{L}$ concludes the proof. ∎

The following lemma relates the concentration of two non-negative random variables to the concentration of their sum.

Lemma B.2.

Let $a\geq 0$ and $b\geq 0$ be two (possibly dependent) random variables. Then, it holds that

[TABLE]

Proof.

Note that

[TABLE]

Since in the expression $\Pr\{a\leq t/2\leavevmode\nobreak\ \wedge\leavevmode\nobreak\ b>t-a\}$ , $a$ is smaller or equal to $t/2$ , we have that $b>t-a$ implies that $b>t/2$ , and thus, $\Pr\{a\leq t/2\leavevmode\nobreak\ \wedge\leavevmode\nobreak\ b>t-a\}\leq\Pr\{b>t/2\}$ . ∎

To prove Theorem 2.1 we will use the following definition.

Definition 2.

Let $\widetilde{\zeta}\in\mathbb{R}^{2L}$ be a vector of identically distributed sub-exponential random variables with zero mean. Suppose that for any $i=0,\ldots L-1$ , $\widetilde{\zeta}_{i}$ depends only on $\widetilde{\zeta}_{(i+1)\mod L},\widetilde{\zeta}_{L+i}$ and $\widetilde{\zeta}_{L+(i+1\mod L)}$ . Then, we call $\widetilde{\zeta}$ “piecewise-i.i.d.”(identically distributed, piecewise independent).

B.2 Concentration results for sub-exponential random vectors

In this section, we show that a ”piecewise i.i.d.” vector $\widetilde{\zeta}$ , defined in Definition 2, admits some concentration properties related to sub-exponential random vectors. The main lemma of this section is Lemma B.7.

First, we state the following result from [31].

Proposition B.3.

[Proposition 5.16 in [31]] Let $X=\left(X_{1},\ldots,X_{m}\right)\in\mathbb{R}^{m}$ , where $X_{i}$ are independent sub-exponential random variables with zero mean. Then, there exists a constant $\beta>0$ , such that for every $a=\left(a_{1},\ldots,a_{m}\right)\in\mathbb{R}^{m}$ and every $t\geq 0$

[TABLE]

We mention that we dropped the sub-Gaussian tail from the original statement of the proposition in [31] (and thus the bound above is weaker then the original bound) as it can be bounded by the sub-exponential tail, which is the one of interest for our purposes.

It follows from Proposition B.3 that the random variable $\left\langle X,a\right\rangle$ is uniformly sub-exponential (i.e., bounded by the same decay rate) for all vectors $a$ with $\left\|a\right\|_{\infty}$ bounded (e.g., for $a\in\mathcal{S}^{2L-1}$ ). We now prove that a similar property holds for a piecewise i.i.d. vector $\widetilde{\zeta}$ , even though its elements are not independent. To prove such a property, we use Lemma B.2 to decompose the vector $\widetilde{\zeta}$ into $O(1)$ vectors, each with independent entries. This is stated and proved in Lemma B.4.

Lemma B.4.

Let $\widetilde{\zeta}\in\mathbb{R}^{2L}$ be “piecewise-i.i.d.” as in Definition 2. Then, there exists a constant $\beta_{2}>0$ , depending on the distribution of the entries of $\widetilde{\zeta}$ , such that for every $a=\left(a[0],\ldots,a[2L-1]\right)\in\mathcal{S}^{2L-1}$ (i.e. $\sum_{k=0}^{2L-1}\left|a[k]\right|^{2}=1$ ) and every $t\geq 0$

[TABLE]

Proof.

Let us assume for simplicity that $L$ is even, and define the four vectors

[TABLE]

By the definition of $\widetilde{\zeta}$ (see Definition 2), it follows that the elements in each of $\widetilde{\zeta}_{1}$ , $\widetilde{\zeta}_{2}$ , $\widetilde{\zeta}_{3}$ , $\widetilde{\zeta}_{4}$ are independent and sub-exponential. For every vector $a\in\mathcal{S}^{2L-1}$ , we reorganize it into four vectors $\left\{a_{i}\right\}_{i=1}^{4}$ analogously to $\left\{\widetilde{\zeta}_{i}\right\}_{i=1}^{4}$ , and get using the triangle inequality that

[TABLE]

Then, using Lemma B.2 twice, we get that

[TABLE]

Lastly, we apply Proposition B.3 to each of $\left\{\widetilde{\zeta}_{i}\right\}_{i=1}^{4}$ and get that

[TABLE]

where we used the fact that $\left\|a_{i}\right\|_{\infty}\leq\left\|a\right\|_{\infty}\leq 1$ for $a\in\mathcal{S}^{2L-1}$ . Clearly, the constant $8$ is of no real significance, and can be replaced by $2$ together with an appropriate change in the exponent for any $t>0$ . Since, in addition, for $t\geq 0$ we have that

[TABLE]

we can pick a suitable $\beta_{2}$ such that

[TABLE]

for all $t\geq 0$ . This concludes the proof for even values of $L$ . For odd values of $L$ , one can repeat the proof with a slightly different decomposition of the vector $\widetilde{\zeta}$ . ∎

Next, we derive a concentration result for the norm $\left\|\widetilde{\zeta}\right\|$ . We first state the following large deviation result for vectors of independent sub-exponential random variables (Lemma 8.3 in [19], slightly reformulated, and stated for $\alpha=1$ , which corresponds to our definition of sub-exponential variables, and with the choice $B_{ii}=1$ ).

Proposition B.5.

[19]** Let $X=\left(X_{1},\ldots,X_{m}\right)\in\mathbb{R}^{m}$ , where $X_{i}$ are independent sub-exponential random variables with mean zero and variance $1/2$ . Then, there exist constants $C,c>0$ such that

[TABLE]

Proposition B.5 essentially states that norms of vectors consisting of independent sub-exponential random variables cannot be too large, as they are well concentrated around their means. Now, we shall prove the same result (with different constants) for our “piecewise-i.i.d.” vectors $\widetilde{\zeta}$ , even though their elements are not independent.

Lemma B.6.

Let $\widetilde{\zeta}\in\mathbb{R}^{2L}$ be “piecewise-i.i.d.” as in Definition 2, with $\mbox{var}(\widetilde{\zeta}[1])=1/2$ . Then, there exist constants $C_{2},c_{2}>0$ , such that

[TABLE]

Proof.

The proof follows along the same lines as the proof of Lemma B.4. Suppose that $L$ is even, and define the vectors $\left\{\widetilde{\zeta}_{i}\right\}_{i=1}^{4}$ as in (77). Then,

[TABLE]

By, using Lemma B.2 twice, we get

[TABLE]

and by substituting $\tau=\frac{1}{2}+\frac{t}{2\sqrt{2L}}$ and applying Proposition B.5 (observing that the elements in $\left\{\widetilde{\zeta}_{i}\right\}_{i=1}^{4}$ satisfy the required conditions) we have that

[TABLE]

when using $C_{2}=16C$ , $c_{2}=c/\sqrt{2}$ . This concludes the proof for even values of $L$ . For odd values of $L$ , the proof can be repeated with a slightly different partitioning of $\widetilde{\zeta}$ . ∎

Next, using the concentration results of Lemma B.4 and Lemma B.6, we are able to use the results of [7] to bound the norm of the sample covariance matrix of $\widetilde{\zeta}$ . This is the subject of the next lemma.

Lemma B.7.

Let $\widetilde{\zeta}_{1},\ldots,\widetilde{\zeta}_{N}$ be i.i.d. samples of the random vector $\widetilde{\zeta}\in\mathbb{R}^{2L}$ which is “piecewise-i.i.d.” as in Definition 2, and assume that $N>2L$ is large enough. Assume further that $\mathbb{E}\left[\widetilde{\zeta}\widetilde{\zeta}^{T}\right]=\frac{1}{2}I_{2L}$ . Then, there exist constants $C_{4},c_{3},c_{4}>0$ such that

[TABLE]

where the constant $C_{2}$ is from Lemma B.6.

Proof.

At the heart of this proof is Theorem 1 from [5] (see also Corollary 1 from [5]) which requires two conditions on the random vectors $\widetilde{\zeta}_{i}$ (equations (2) and (3) in [5]). The first condition is that for some $\psi>0$ ,

[TABLE]

where the $\Psi_{1}$ -norm of a random variable $X\in\mathbb{R}$ is defined as

[TABLE]

The $\Psi_{1}$ -norm is a characterization of a sub-exponential random variable (see [30] for the $\Psi_{1}$ -norm of sub-exponential random variables). As we have shown in Lemma B.4, each random variable $\left\langle\widetilde{\zeta}_{i},a\right\rangle$ is sub-exponential uniformly over all $a\in\mathcal{S}^{2L-1}$ (that is, bounded by the same decay rate for all $a$ ). Therefore, from the definition of a sub-exponential random variable, and as $\widetilde{\zeta}_{i}$ are identically distributed sub-exponential random variables, we have by Lemma 2.3 of [6] that

[TABLE]

for all $i=1,\ldots,N$ , and for and some $\psi>0$ . Hence, the condition (78) is satisfied.

The second condition required by Theorem 1 in [5] is that there exists a constant $K\geq 1$ such that

[TABLE]

Note that (79) does not follow immediately from Lemma B.6. Therefore, we instead define restricted random vectors $\bar{\zeta}_{i}$ which satisfy the boundedness condition (79) (as well as condition (78)), and are equivalent to the original vectors $\widetilde{\zeta}_{i}$ with high probability. Consider the random vector $\bar{\zeta}$

[TABLE]

where $\vec{0}$ denotes the $2L\times 1$ vector of zeros. Note that for two random variables $X_{1}$ and $X_{2}$ , if $|X_{1}|\leq|X_{2}|$ , then $\|X_{1}\|_{\psi_{1}}\leq\|X_{2}\|_{\psi_{1}}$ . Clearly, the first condition (78) is satisfied for $\bar{\zeta}$ since $\left|\left\langle\bar{\zeta},a\right\rangle\right|\leq\left|\left\langle\widetilde{\zeta},a\right\rangle\right|$ . The second condition (79) is also satisfied due to the definition of $\bar{\zeta}$ .

Up to this point, we showed that Theorem 1 in [5] holds for the vectors $\bar{\zeta}_{1},\ldots,\bar{\zeta}_{N}$ ( $N$ i.i.d. samples of $\bar{\zeta}$ ). Explicitly, there exist constants $C^{\prime}_{4},c_{3}$ such that

[TABLE]

In what follows, we evaluate the quantity $\mathbb{E}\left|\left\langle\bar{\zeta},a\right\rangle\right|^{2}$ for $a\in\mathcal{S}^{2L-1}$ (which takes part in the bound (80)), and show that it is sufficiently close to $\mathbb{E}\left|\left\langle\widetilde{\zeta},a\right\rangle\right|^{2}$ . Let us write

[TABLE]

where in the last inequality we used Cauchy-Schwarz inequality and the fact that $a\in\mathcal{S}^{2L-1}$ . Consider the random variable $\chi=\left\|\widetilde{\zeta}\right\|^{2}\bigg{|}\left\{\frac{\left\|\widetilde{\zeta}\right\|^{2}}{2L}>K\left(\frac{N}{2L}\right)^{1/2}\right\}$ , whose probability density function is

[TABLE]

where $p_{0}$ is the normalization for the density of $\chi$

[TABLE]

Then,

[TABLE]

Note that for any non-negative continuous random variable $\chi$

[TABLE]

where the last equality is due to interchanging the order of integration. Now, using (84), we have

[TABLE]

where we substituted $x=2L(1/2+t/(2\sqrt{2L}))=L+t\sqrt{L/2}$ (or $t=\sqrt{2/L}(x-L)$ ) in the bound of Lemma B.6. Note that according to Lemma B.6 (by substituting $1/2+t/(2\sqrt{2L})=K\sqrt{N/(2L)}$ and after some manipulation)

[TABLE]

where $c_{4}=c_{2}\sqrt{\sqrt{2}K-1}$ . Overall, by (83), (86), and (87) we have that

[TABLE]

Note the the term $K\sqrt{2NL}C_{2}e^{-c_{4}N^{1/4}}$ decays exponentially with $N^{1/4}$ , and therefore, it is clear that for a large enough $N$

[TABLE]

for some constant $C^{\prime\prime}_{4}$ , which can be chosen arbitrarily small. Also, using a standard integration formula [4], we have that

[TABLE]

with $y_{0}=\sqrt{2c_{2}^{4}/L}\cdot(K\sqrt{2NL}-L)$ , and thus, it also follows that, for a large enough $N$ ,

[TABLE]

for some constant $C^{\prime\prime\prime}_{4}$ , which can be chosen arbitrarily small. Next, by plugging (90) and (89) in (88) and using the result in (81) we have

[TABLE]

where the second inequality is due to the definition of $\bar{\zeta}$ , and the last equality is since $\mathbb{E}\left[\widetilde{\zeta}\widetilde{\zeta}^{T}\right]=\frac{1}{2}I_{2L}$ , from which it follows that

[TABLE]

where $\widetilde{\zeta}[k]$ is the $k$ th element of $\widetilde{\zeta}$ .

Note that for any $a,b,b_{0}>0$ and a small enough $\varepsilon$ , if $b_{0}-\varepsilon\leq b\leq b_{0}$ then

[TABLE]

From (91) and (92), we have that

[TABLE]

Therefore, rewriting (80) using (93) we have

[TABLE]

We absorb the term $(C^{\prime\prime}_{4}+C^{\prime\prime\prime}_{4})\left(\sqrt{\frac{L}{N}}\right)$ into $C^{\prime}_{4}\left(\sqrt{\frac{2L}{N}}\right)$ , and we have that, for some constant $C_{4}$ ,

[TABLE]

Equation (94) is the result of applying Theorem 1 in [5] to the truncated vectors $\bar{\zeta}_{1},\ldots,\bar{\zeta}_{N}$ . Next, we adapt this result to the original vectors $\widetilde{\zeta}_{1},\ldots,\widetilde{\zeta}_{N}$ . Using (82), (87), and the union bound, we have that

[TABLE]

Denote the event $\left\{\max_{i\leq N}\frac{\left\|\widetilde{\zeta}_{i}\right\|^{2}}{2L}>K\sqrt{\frac{N}{2L}}\right\}$ by $A$ , and its compliment by $\bar{A}$ . We have that

[TABLE]

and

[TABLE]

and therefore, using the law of total probability, by combining (95) and (96), we have

[TABLE]

Noting that

[TABLE]

we rewrite (97) as

[TABLE]

which concludes the proof.

∎

Essentially, Lemma B.7 establishes the concentration of the sample covariance matrix of the random vector $\widetilde{\zeta}$ around the population covariance matrix (which is $1/2\cdot I_{2L}$ ).

B.3 Proof of Lemma 2.1

We begin with Lemma B.8 that will help us prove Lemma 2.1.

Lemma B.8.

Let

[TABLE]

where $u^{(1)}$ is defined in (14), and let $\widetilde{C}_{z}^{(1)}$ be given by (20) and (21). Then, $\Sigma_{u}^{(1)}-\widetilde{C}_{z}^{(1)}$ can be written as

[TABLE]

where $A^{\prime}_{0},A^{\prime}_{2},A^{\prime}_{3}$ and $A^{\prime}_{4}$ are independent of $\sigma$ .

Proof.

We define the random vector $\hat{\xi}\in\mathbb{C}^{L}$ such that

[TABLE]

where $\hat{\eta}$ is the noise vector from the model (7), and therefore

[TABLE]

Then, $z^{(1)}$ of (17) can be written as

[TABLE]

where we defined the random vector $\hat{\zeta}\in\mathbb{C}^{L}$ via

[TABLE]

We have from (20) and (21) that

[TABLE]

where $z^{(1)}_{1},\ldots,z^{(1)}_{N}\in\mathbb{C}^{L}$ are i.i.d. samples of $z^{(1)}$ , and $\widetilde{\Sigma}_{\epsilon}^{(1)}$ is the estimated bias matrix (as in (65) except that $p_{x}$ is replaced with its estimate $\widetilde{p}_{x}$ )

[TABLE]

recalling from (11) and (7) that

[TABLE]

Since $|\omega|=1$ , from (104) and (105), we have

[TABLE]

Next, we substitute (101) into (103)

[TABLE]

where (recalling that $\hat{x}_{i}[k]=a_{i}\hat{\theta}[k]$ )

[TABLE]

and

[TABLE]

with

[TABLE]

and $\hat{\zeta}$ defined in (102). From the definition of $\Sigma_{u}^{(1)}[k_{1},k_{2}]$ in (99), defining

[TABLE]

concludes the proof. ∎

We will also need the following two supporting lemmas.

Lemma B.9.

Let $u,\widetilde{u}\in\mathbb{C}^{L}$ such that $|u[k]|=|v[k]|=1$ for $k=0,\ldots,L-1$ . Let $w,\widetilde{w}\in(\mathbb{R}\mod 2\pi)^{L}$ such that $w[k]=\arg(u[k])$ and $\widetilde{w}[k]=\arg(\widetilde{u}[k])$ , then

[TABLE]

Proof.

Denote, for any $k$ , $\eta[k]=|\widetilde{u}[k]-u[k]|$ . Then $\widetilde{u}[k]=u[k]+\eta[k]e^{i\phi[k]}$ for some $\phi[k]$ . In this notation,

[TABLE]

or,

[TABLE]

It is easy to verify that the $\phi[k]$ that maximizes $\widetilde{w}[k]$ in (113) is $\phi_{0}[k]$ such that $\cos\phi_{0}[k]=\eta[k]$ . Thus, since $\arctan$ is monotonously increasing, $\sin(\phi[k])\leq 1$ , and $|\arctan x|<|x|$ , we have

[TABLE]

Since (115) holds for all $k$ , we have that

[TABLE]

∎

Lemma B.10.

Let $u,\widetilde{u}\in\mathbb{C}^{L}$ , with $\|u\|=\|\widetilde{u}\|=1$ , and denote $u_{p},\widetilde{u}_{p}\in\mathbb{C}^{L}$ such that $u_{p}[k]=\frac{u[k]}{|u[k]|}$ and $\widetilde{u}_{p}[k]=\frac{\widetilde{u}[k]}{|\widetilde{u}[k]|}$ . Denote $\delta_{1}=\min_{k\in\{0,\ldots,L-1\}}|u[k]|$ , and assume that $\delta_{1}>0$ and $\|\widetilde{u}-u\|\leq\delta_{1}/2$ . Then,

[TABLE]

Proof.

Denote $c,\widetilde{c}\in\mathbb{R}^{L}$ such that $c[k]=\frac{1}{|u[k]|}$ and $\widetilde{c}[k]=\frac{1}{|\widetilde{u}[k]|}$ for $k=0,\ldots,L-1$ . From the definition of $\delta_{1}$ and the reverse triangle inequality we have

[TABLE]

Since $|u[k]|\geq\delta_{1}$ and $\left|\widetilde{u}[k]-u[k]\right|\leq\|\widetilde{u}-u\|\leq\delta_{1}/2$ we have that,

[TABLE]

or, together with (118), we have

[TABLE]

and thus

[TABLE]

Note that for any $v\in\mathbb{C}^{L}$ and $a\in\mathbb{R}^{L}$ we have that

[TABLE]

Using the triangle inequality and (121), we have

[TABLE]

Combining (122) with (120) and the fact that $|c[k]|=1/|u[k]|\leq 1/\delta_{1}$ , we get

[TABLE]

where the second inequality is due to the assumption $\|\widetilde{u}-u\|\leq\delta_{1}/2$ and the last inequality holds since $\delta_{1}<1$ .

∎

Next we prove Lemma 2.1.

Proof.

We have from Lemma B.8 that

[TABLE]

and so,

[TABLE]

Now, recall from (99) that $\Sigma_{u}^{(1)}$ is a rank one matrix whose leading eigenvalue is $2\lambda^{2}$ . Due to the assumption in Lemma 2.1,

[TABLE]

Now, since $\widetilde{u}^{(1)}$ is the leading eigenvector of $\widetilde{C}_{z}^{(1)}$ , then by the Davis-Kahan $\sin\Theta$ theorem [15] (noting that the spectral gap of $\Sigma_{u}^{(1)}$ is equal to its largest eigenvalue since it is a rank one matrix), using (124), it follows that

[TABLE]

where $\alpha_{1}$ is some complex constant satisfying $\left|\alpha_{1}\right|=1$ .

Next we show that there is some $s_{1}\in\{0,\ldots,L-1\}$ such that

[TABLE]

Denote $\widetilde{u}^{(1)}_{p},u^{(1)}_{p}\in\mathbb{C}^{L}$ such that $\widetilde{u}^{(1)}_{p}[k]=\frac{\widetilde{u}^{(1)}[k]}{|\widetilde{u}^{(1)}[k]|}$ and $u^{(1)}_{p}[k]=\frac{u^{(1)}[k]}{|u^{(1)}[k]|}$ . First, suppose that,

[TABLE]

(the other case will be handled below). Then, by Lemma B.10 we have,

[TABLE]

Denote $w,\widetilde{w}\in\mathbb{R}^{L}$ such that $w[k]=\arg(u^{(1)}_{p}[k])$ and $\widetilde{w}[k]=\arg(\widetilde{u}^{(1)}_{p}[k])$ , and note that $\arg(\alpha_{1}u^{(1)}_{p}[k])=w_{k}+\beta_{1}$ , where $\alpha_{1}=e^{\imath\beta_{1}}$ . Then by Lemma B.9 and by (127),

[TABLE]

where $\mathbbm{1}$ is a vector of ones of length $L$ . Thus, if we write $\widetilde{w}=w-\mathbbm{1}\beta_{1}+\varepsilon$ , where $\varepsilon\in\mathbb{R}^{L}$ than,

[TABLE]

Summing over both sides of $\mathbbm{1}\beta_{1}=w-\widetilde{w}+\varepsilon$ we get,

[TABLE]

Recall from (22) that $\sum_{k=0}^{L-1}w[k]=0\mod 2\pi$ and from (23) that $\sum_{k=0}^{L-1}\widetilde{w}[k]=0\mod 2\pi$ , and thus from (130) we have

[TABLE]

for some $s_{1}\in\{0,\ldots,L-1\}$ . In other words, we showed that $\beta_{1}$ , the phase of $\alpha_{1}$ of (125) is ”not too far” from an $L$ ’th root of one. Now, note that,

[TABLE]

Since $|e^{ix}-1|=2|\sin\frac{x}{2}|\leq|x|$ and from (129), we have

[TABLE]

Since $\delta_{1}\leq 1$ , from (132) and (133) we have that

[TABLE]

in case $\left\|\widetilde{u}^{(1)}-\alpha_{1}\frac{u^{(1)}}{\left\|u^{(1)}\right\|}\right\|\leq\delta_{1}/2$ .

Suppose now that $\left\|\widetilde{u}^{(1)}-\alpha_{1}\frac{u^{(1)}}{\left\|u^{(1)}\right\|}\right\|>\delta_{1}/2$ . Since $\|\widetilde{u}^{(1)}\|=1$ and $\left\|e^{-\imath 2\pi s_{1}/L}\frac{u^{(1)}}{\left\|u^{(1)}\right\|}\right\|=1$ , we have

[TABLE]

Combining (134) and (135) we have,

[TABLE]

for all values of $\left\|\widetilde{u}^{(1)}-\alpha_{1}\frac{u^{(1)}}{\left\|u^{(1)}\right\|}\right\|$ . From (136) and (125) we have

[TABLE]

Combining (137) with (123), and denoting

[TABLE]

concludes the proof.

∎

B.4 Proof of Theorem 2.2

Proof.

Recall from (109) that

[TABLE]

with

[TABLE]

for $m\in\left\{0,1\right\}$ , and so

[TABLE]

As for the terms $\left\|E^{(0)}\right\|$ and $\left\|E^{(1)}\right\|$ in (139), since $E^{(0)}$ and $E^{(1)}$ are diagonal matrices, we have

[TABLE]

for $m\in\left\{0,1\right\}$ , and therefore, it is also clear that

[TABLE]

Since $\hat{\xi}_{i}[k]$ are i.i.d. standard Gaussian random variables, the random variable $|\hat{\xi}_{i}[k+m]|^{2}$ has chi-squared distribution (which is sub-exponential) with expected value of $1$ , and so $|\hat{\xi}_{i}[k+m]|^{2}-1$ is sub-exponential with zero mean. Thus, we apply Lemma B.1, and get that

[TABLE]

for some constants $C^{{}^{\prime}},c^{{}^{\prime}}>0$ . Since $\left\|E^{(0)}\right\|=\left\|E^{(1)}\right\|$ we have that

[TABLE]

Next, we consider

[TABLE]

from (139). Requiring that (142) is small is essentially a concentration result for the sample covariance of the random vector $\hat{\zeta}$ . We mention that since each element of $\hat{\zeta}$ is the product of two Gaussian random variables (see (102)), the elements of $\hat{\zeta}$ admit a sub-exponential distribution (see Lemma 2.7.7 in [30]).

We start by characterizing the real and imaginary parts of $\hat{\zeta}$ from (102),

[TABLE]

where $\hat{\xi}_{R}=\operatorname{Re}\left\{\hat{\xi}\right\}$ , $\hat{\xi}_{I}=\operatorname{Im}\left\{\hat{\xi}\right\}$ . Let us now define the random vector $\widetilde{\zeta}\in\mathbb{R}^{2L}$

[TABLE]

Since $\hat{\xi}_{R}[k]$ , $\hat{\xi}_{I}[k]$ , $\hat{\xi}_{R}[k+1]$ , $\hat{\xi}_{I}[k+1]$ are mutually independent Gaussian random variables with mean zero and variance $1/2$ (from (100)), by (143) and (144) we have that $\mathbb{E}(\hat{\zeta}^{2}_{R}[k_{1}])=1/2,\leavevmode\nobreak\ \mathbb{E}(\hat{\zeta}^{2}_{I}[k_{1}])=1/2$ , $\mathbb{E}(\hat{\zeta}_{R}[k_{1}]\hat{\zeta}_{R}[k_{2}])=0,\leavevmode\nobreak\ \mathbb{E}(\hat{\zeta}_{I}[k_{1}]\hat{\zeta}_{I}[k_{2}])=0$ , and $\mathbb{E}(\hat{\zeta}_{R}[k_{1}]\hat{\zeta}_{I}[k_{2}])=0$ for $k_{1}\neq k_{2}$ . Additionally,

[TABLE]

Note that the covariance matrix of $\sqrt{2}\widetilde{\zeta}$ is the identity matrix, i.e.

[TABLE]

where $I_{2L}$ is the $2L\times 2L$ identity matrix.

Since $\widetilde{\zeta}$ satisfies the requirements of Definition 2 and (146), we apply Lemma B.7 and get

[TABLE]

or, equivalently,

[TABLE]

holds with probability at least

[TABLE]

Next, we adapt this result to the complex valued vector $\hat{\zeta}$ of (102).

From (145) we have

[TABLE]

Note that, if

[TABLE]

then,

[TABLE]

Since

[TABLE]

it follows from (148) ,(149) and (151) that, with probability at least

[TABLE]

it holds that

[TABLE]

Thus, from (153) it follows that

[TABLE]

with probability at least

[TABLE]

Plugging (154), (155) and (141) in (139) gives that

[TABLE]

with probability at least

[TABLE]

where we defined $C_{5}=\sqrt{32}C_{4}$ .

Now, recall from (138) that $A_{4}=\frac{2L}{\lambda^{2}\delta_{1}^{2}\gamma_{1}}A^{\prime}_{4}$ , and thus

[TABLE]

for $C_{3}=2(C_{5}+2)$ , which concludes the proof. ∎

B.5 Non-Asymptotic error bound for $\widetilde{u}^{(1)}$

In this section, we would like to get a better understanding of the error

[TABLE]

in the regime of large $N$ and large $\sigma$ . From Lemma 2.1 we have that

[TABLE]

and from Theorem 2.2 we have

[TABLE]

with high probability.

Note that $A_{0}[k_{1},k_{2}],A_{2}[k_{1},k_{2}]$ , and $A_{3}[k_{1},k_{2}]$ are all averages of $N$ samples of random variables with zero expected value (see (107), (108), (111), and (138)). Thus, from the central limit theorem we have that

[TABLE]

for large $N$ for all $k_{1}$ and $k_{2}$ . Since we are interested in the regime of constant $L$ and large $N$ , we can extend (161) to bound all the entries of $A_{0},A_{2}$ , and $A_{3}$ simultaneously (by the maximum of $L^{2}$ random variables in each matrix), and get that

[TABLE]

The result in (162) together with (160) means that all “parts” of the bound (159) of $\left\|\widetilde{u}^{(1)}-e^{-\imath 2\pi s_{1}/L}\frac{u^{(1)}}{\left\|u^{(1)}\right\|}\right\|$ (namely, $\|A_{0}\|,\sigma^{2}\|A_{2}\|,\sigma^{3}\|A_{3}\|$ and $\sigma^{4}\|A_{4}\|$ ) go to zero at the same rate when $N\rightarrow\infty$ , and thus, in the case of large $N$ and large $\sigma$ , the dominant part of the bound is $\sigma^{4}\|A_{4}\|$ (since $\sigma^{4}$ dominates $1,\leavevmode\nobreak\ \sigma^{2}$ , and $\sigma^{3}$ for large $\sigma$ ). This, together with (160) means that in the regime of large $\sigma$ and large $N$ ,

[TABLE]

Appendix C Proof of Theorem 2.3

In Appendix A, we have shown that (see (64))

[TABLE]

where $\Sigma^{(m)}_{\epsilon}$ is a bias term given by

[TABLE]

By (20) and (21), $\widetilde{C}_{z}^{(m)}$ can be expressed as

[TABLE]

where $\widetilde{\Sigma}_{\epsilon}^{(m)}$ is the estimated bias term given by

[TABLE]

Since $\widetilde{p}_{x}$ of (11) is a consistent estimator for $p_{x}$ , we have that

[TABLE]

and thus,

[TABLE]

for every $m=1,\ldots,L-1$ . Correspondingly, from the Davis-Kahan $\sin{\Theta}$ theorem [15], the eigenspace $\operatorname{span}\{\widetilde{u}^{(m)}\}$ of the leading eigenvalue of $\widetilde{C}_{z}^{(m)}$ converges almost surely to $\operatorname{span}\{u^{(m)}\}$ . Thus, because of the updates made in (23) and (24), we have that

[TABLE]

where $\alpha_{m}$ is unknown with $|\alpha_{m}|=1$ . Recall that $u^{(m)}[k]=\hat{\theta}[k]\hat{\theta}^{*}[k+m]$ , and

[TABLE]

Therefore, if $|\hat{\theta}[k]|>0$ for every $k=0,\ldots,L-1$ , then $|u^{(m)}[k]|>0$ for every $k=0,\ldots,L-1$ , and we have

[TABLE]

where $\varphi_{m}=\operatorname{arg}\left\{\alpha_{m}\right\}$ . Lastly, by the formulas for the frequency marching estimator (31) and (32), using (165) with $m=1$ , we have

[TABLE]

where the last equality is due to (10). Therefore

[TABLE]

where

[TABLE]

Note that due to the update made in (23) we have that $\sum_{k=0}^{L-1}\operatorname{arg}\left\{\widetilde{u}^{(m)}[k]\right\}\equiv 0\leavevmode\nobreak\ (\operatorname{mod}2\pi)$ and from (165) we have that $\sum_{k=0}^{L-1}\operatorname{arg}\left\{\widetilde{u}^{(m)}[k]\right\}\underset{\text{a.s.,\;}N\rightarrow\infty}{\longrightarrow}L\varphi_{m}$ for all $m$ . Thus $L\varphi_{1}\equiv 0\leavevmode\nobreak\ (\operatorname{mod}2\pi)$ and thus $s_{1}$ is indeed an integer.

Appendix D Proof of Theorem 2.4

D.1 Supporting lemmas

We begin with Lemma D.1, which bounds the error of the “frequency marching” in Step 15 of Algorithm 1. For future reference we denote the “frequency marching” procedure as Algorithm 3.

Lemma D.1.

Let $v\in\mathbb{C}^{L}$ s.t. $|v[k]|=1$ for $k=0,\ldots,L-1$ , and let $u\in\mathbb{C}^{L}$ s.t. $u[k]=v[k]v^{*}[k+1],k=0,\ldots,L-1$ . Let $\widetilde{u}\in\mathbb{C}^{L}$ be an estimate of $u$ , with $\sum_{k=0}^{L-1}\arg\{\widetilde{u}[k]\}=0$ . Then, applying Algorithm 3 on $\widetilde{u}$ returns $\widetilde{v}\in\mathbb{C}^{L}$ s.t. $\|v-\alpha\widetilde{v}\|\leq\|u-\widetilde{u}\|\cdot\widetilde{C}^{\prime}L$ for some $\alpha\in\mathbb{C}$ with $|\alpha|=1$ and some constant $\widetilde{C}^{\prime}$ independent of $L$ .

Proof.

Since $v,u$ are vectors with entries of norm $1$ , we switch to working with angles. Denote by $\bar{v},w,\widetilde{w}\in\mathbb{R}^{L}$ the vectors $\bar{v}[k]=\arg(v[k])$ , $w[k]=\arg(u[k])$ and $\widetilde{w}[k]=\arg(\widetilde{u}[k])$ .

By Lemma B.9, we have

[TABLE]

From the definition of $\bar{v}$ and $w$ , we have that $w[k]=\bar{v}[k]-\bar{v}[k+1]$ . In other words, we can write $A\bar{v}=w$ where

[TABLE]

Note that we can also write $A=I-\mathcal{R}_{1}$ , where $\mathcal{R}_{1}$ is the linear operator of a cyclic shift by 1. The eigen-decomposition of $\mathcal{R}_{1}$ is

[TABLE]

where $F$ is the $L\times L$ discrete Fourier transform matrix. Thus,

[TABLE]

The smallest non-zero singular value of $A$ is $|1-e^{2\pi\imath/L}|\geq\sin(\frac{2\pi}{L})\geq\frac{2\pi}{L}-\frac{(2\pi)^{3}}{6L^{3}}$ , thus, $\|A^{\dagger}\|\leq\widetilde{C}^{\prime}L$ , where $A^{\dagger}$ is the Moore–Penrose pseudo-inverse (satisfying that $x=A^{\dagger}y$ is a solution of $y=Ax$ ), and $\widetilde{C}^{\prime}$ is some constant independent of $L$ . Note that the null space of $A$ ( $\ker A$ ) is one-dimensional as $A$ has a single zero singular value. It is also evident from (170) that for any constant vector $\bar{\alpha}\in\mathbb{C}^{L}$ we have that $A\bar{\alpha}=0$ , and thus the null space of $A$ consists solely of constant vectors. Additionally, also from (170), if $y=Ax$ then $\sum_{k=0}^{L-1}y[k]=0$ . Since the constraint $\sum_{k=0}^{L-1}y[k]=0$ defines a linear subspace of dimension $L-1$ , and since $\dim(\operatorname{Im}(A))=L-1$ , any vector $y$ with $\sum_{k=0}^{L-1}y[k]=0$ is in the image of $A$ .

The “frequency marching” procedure described in Algorithm 3 (and in Step 15 of Algorithm 1), when written in the form of (29), solves

[TABLE]

for $\hat{\theta}$ , for all $k$ . Rewriting (171) in matrix form, Algorithm 3 solves $\operatorname{arg}\{u\}=A\operatorname{arg}\{\hat{\theta}\}$ . Since $\sum_{k=0}^{L-1}\widetilde{w}[k]=0$ we have that $\widetilde{w}\in\operatorname{Im}A$ , and therefore, there exists a constant vector $\bar{\alpha}\in\mathbb{C}^{L}$ such that $\widetilde{\bar{v}}+\bar{\alpha}=A^{\dagger}\widetilde{w}$ , where $\widetilde{\bar{v}}[k]=\arg\{\widetilde{v}[k]\}$ (since constant vectors are the null space of $A$ and $\widetilde{\bar{v}}$ is a solution to $\widetilde{w}=A\widetilde{\bar{v}}$ ). Additionally, from the definition of $w$ ( $w=A\bar{v}$ ) there is a constant vector $\bar{\beta}\in\mathbb{C}^{L}$ such that $\bar{v}+\bar{\beta}=A^{\dagger}w$ . Now, we have,

[TABLE]

where the last inequality is due to (169). Since $|e^{ix}-1|=2|\sin\frac{x}{2}|\leq|x|$ , denoting $\alpha=\exp\{\imath(\bar{\beta}-\bar{\alpha})\}$ , we have from (172) that

[TABLE]

∎

Next, in Lemma D.2, we extend the result of Lemma D.1 by removing the requirement that $|v[k]|=1$ for $k=0,\ldots,L-1$ .

Lemma D.2.

Let $v\in\mathbb{C}^{L}$ , and let $v_{p}\in\mathbb{C}^{L}$ be the vector of phases of $v$ . Let $u\in\mathbb{C}^{L}$ such that $u[k]=v[k]v^{*}[k+1],k=0,\ldots,L-1$ . Assume that $\|u\|=1$ . Denote $\delta_{1}=\min_{k\in\{0,\ldots,L-1\}}|u[k]|$ , and assume that $\delta_{1}>0$ . Let $\widetilde{u}\in\mathbb{C}^{L}$ with $\|\widetilde{u}\|=1$ and $\sum_{k=0}^{L-1}\arg\{\widetilde{u}[k]\}=0$ be an estimate of $u$ . Then, applying Algorithm 3 on $\widetilde{u}$ returns $\widetilde{v}\in\mathbb{C}^{L}$ s.t. $\|v_{p}-\alpha\widetilde{v}\|\leq\frac{\|u-\widetilde{u}\|}{\delta_{1}^{2}}\cdot\widetilde{C}L$ for some $\alpha\in\mathbb{C}$ with $|\alpha|=1$ and some constant $\widetilde{C}$ independent of $L$ .

Proof.

Denote $u_{p},\widetilde{u}_{p}\in\mathbb{C}^{L}$ such that $u_{p}[k]=\frac{u[k]}{|u[k]|}$ and $\widetilde{u}_{p}[k]=\frac{\widetilde{u}[k]}{|\widetilde{u}[k]|}$ . Assume that $\|\widetilde{u}-u\|\leq\delta_{1}/2$ (we will later consider the alternative). Then, by Lemma B.10 we have,

[TABLE]

Since $u_{p}[k]=v_{p}[k]v_{p}^{*}[k+1],k=0,\ldots,L-1$ , Lemma D.1 guarantees that applying Algorithm 3 on $\widetilde{u}_{p}$ will result in $\widetilde{v}$ such that

[TABLE]

for some $\alpha\in\mathbb{C}$ with $|\alpha|=1$ . Thus, We showed that applying the frequency marching procedure on $\widetilde{u}_{p}$ results in $\widetilde{v}$ as required. Since the frequency marching procedure dose not depend on the magnitude of the input vector, applying it on $\widetilde{u}$ results in the same vector as applying it on $\widetilde{u}_{p}$ .

Until now, we have shown that $\|v_{p}-\alpha\widetilde{v}\|\leq 3\frac{\|(\widetilde{u}-u)\|}{\delta_{1}^{2}}\widetilde{C}^{\prime}L,$ in the case when $\|\widetilde{u}-u\|\leq\delta_{1}/2$ . Assume now that $\|\widetilde{u}-u\|>\delta_{1}/2$ . Since $|v_{p}[k]|=|\widetilde{v}[k]|=1$ for all $k$ ( $v_{p}$ is a vector of phases, and since $\widetilde{v}$ is the output of Algorithm 3 it is also a vector of phases), we have that $(v_{p}[k]-\alpha\widetilde{v}[k])^{2}\leq 4$ , and thus

[TABLE]

where the third inequality holds since $\delta_{1}<1$ and the last inequality is due to the assumption $\|\widetilde{u}-u\|>\delta_{1}/2$ . Finally, denoting $\widetilde{C}=\max(4,3\widetilde{C}^{\prime})$ , we have from (173) and (174) that $\|v_{p}-\alpha\widetilde{v}\|\leq\frac{\|(\widetilde{u}-u)\|}{\delta_{1}^{2}}\widetilde{C}L$ .

∎

The following Lemma shows that if the magnitudes of a signal are estimated accurately, and the phases are estimated accurately, than the signal is estimated accurately.

Lemma D.3.

Let $\theta\in\mathbb{C}^{L}$ with $\|\theta\|=1$ . Denote by $\theta_{m}\in\mathbb{R}^{L}$ the magnitudes of $\theta$ , and by $\theta_{p}$ the phases of $\theta$ (i.e. = $\theta_{p}\odot{\theta_{m}}$ ). Suppose that $\widetilde{\theta}_{m}$ and $\widetilde{\theta}_{p}$ are approximations of $\theta_{m}$ and $\theta_{p}$ such that $\|\widetilde{\theta}_{m}-\theta_{m}\|\leq\varepsilon_{1}$ and $\|\widetilde{\theta}_{p}-\theta_{p}\|\leq\varepsilon_{2}$ . Then, for $\widetilde{\theta}=\widetilde{\theta}_{p}\odot{\widetilde{\theta}_{m}}$ , it holds that $\|\theta-\widetilde{\theta}\|\leq\varepsilon_{1}+\varepsilon_{2}$ .

Proof.

[TABLE]

Since $|\widetilde{\theta}_{p}[i]|=1$ and $|{\theta_{m}}[i]|\leq 1$ for $i=0,\ldots,L-1$ , we get from (121) that

[TABLE]

∎

D.2 Proof of Theorem 2.4

Proof.

The outline of the proof is as follows. We show that the frequency marching procedure results in a “good” estimate of the phases of $\hat{\theta}$ of (6). Then we show that we have a “good” estimate of the magnitudes of $\hat{\theta}$ as well. Finally, we use Lemma D.3 to combine the two and conclude the proof.

Denote

[TABLE]

Denote $u_{s_{0}}^{(1)}=e^{-\imath 2\pi s_{0}/L}\frac{u^{(1)}}{\|u^{(1)}\|}$ , and denote by $u_{ps_{0}}^{(1)}$ the vector of phases of $u_{s_{0}}^{(1)}$ ,that is,

[TABLE]

where $u_{p}^{(1)}$ is the phases part of $u^{(1)}$ ( $u_{p}^{(1)}=u^{(1)}[k]/|u^{(1)}[k]|$ ). Denote $\hat{\theta}_{ps_{0}}[k]=e^{\imath 2\pi ks_{0}/L}\hat{\theta}_{p}[k]$ where $\hat{\theta}_{p}$ is the vector of phases of $\hat{\theta}$ of (6) ( $\hat{\theta}_{p}[k]=\hat{\theta}[k]/|\hat{\theta}[k]|$ ). Note that

[TABLE]

Since $\|\widetilde{u}^{(1)}\|=1$ and from (23), $\sum_{k=0}^{L-1}\arg\{\widetilde{u}^{(1)}[k]\}=0$ , we have that $\widetilde{u}^{(1)}$ and $u_{s_{0}}^{(1)}$ satisfy the requirements of Lemma D.2 (as $\widetilde{u}$ and $u$ correspondingly). Therefore, it follows that applying Algorithm 3 on $\widetilde{u}^{(1)}$ will result in $\widetilde{\theta}_{p}$ such that

[TABLE]

for some $\alpha\in\mathbb{C}$ with $|\alpha|=1$ .

Next, we bound the error in estimating $\hat{\theta}_{m}$ , where $\hat{\theta}_{m}[k]=|\hat{\theta}[k]|$ . From (12) follows that, for large $N$ ,

[TABLE]

for some constant $C^{\prime}_{2}$ . Since for any $v\geq-1$ we have

[TABLE]

we have for any positive $v$ and $\widetilde{v}$

[TABLE]

From (178) and (180), we have

[TABLE]

Note that from the definition of $\delta_{1}$ and from (14) we have that $|\hat{\theta}[k]|\geq\delta_{1}$ , which together with (10) gives $\sqrt{p_{x}[k]}\geq\delta_{1}\sqrt{\lambda}$ . Thus we have from (181),

[TABLE]

From (13) we have that, for large $N$ ,

[TABLE]

for some constant $C^{\prime\prime\prime}_{2}$ . From Taylor expansion of $\frac{1}{1+\varepsilon}$ around [math] we have that for $\left|\varepsilon\right|\leq 1/2$ it holds that

[TABLE]

or,

[TABLE]

which, together with (183), assuming $C^{\prime\prime\prime}_{2}\frac{L\sigma^{2}}{\lambda\sqrt{N}}\leq\frac{1}{2}$ , gives

[TABLE]

where $C^{\prime\prime}_{2}=cC^{\prime\prime\prime}_{2}$ . Similarly to the derivation of (182), using (180), we have

[TABLE]

Thus, combining (182) with (185), we have

[TABLE]

or,

[TABLE]

Recall From (10) that $\frac{\sqrt{{p}_{x}[k]}}{\lambda}=\hat{\theta}_{m}[k]\leq\|\hat{\theta}\|=1$ . Thus we have

[TABLE]

assuming $C^{\prime\prime}_{2}\frac{L\sigma^{2}}{\lambda\sqrt{N}}\leq\frac{1}{2}$ .

In case $C^{\prime\prime}_{2}\frac{L\sigma^{2}}{\lambda\sqrt{N}}>\frac{1}{2}$ , we note that from (11) we have that $\frac{\sqrt{\widetilde{p}_{x}[k]}}{\sqrt{\widetilde{\lambda}}}\leq 1$ and since $\hat{\theta}_{m}[k]\leq 1$ we have,

[TABLE]

From (187) and (188) we have that there is a constant $\stackrel{{\scriptstyle\approx}}{{C}}$ such that,

[TABLE]

for a large enough $N$ . Thus, by Step 12 of Algorithm 1 we have

[TABLE]

By (177), (190) and Lemma D.3, we have

[TABLE]

Since $\hat{\theta}_{ps_{0}}[k]=e^{\imath 2\pi ks_{0}/L}\hat{\theta}_{p}[k]$ , and that the inverse Fourier transform is an orthogonal transformation, we have from Step 18 on Algorithm 1,

[TABLE]

Denoting $s=-s_{0}$ , and noting that applying $\mathcal{R}_{s}$ on a signal, does not not change its norm, we have that

[TABLE]

From (191), (192), and the fact that $u_{s_{0}}^{(1)}=e^{-\imath 2\pi s_{0}/L}\frac{u^{(1)}}{\|u^{(1)}\|}$ , we have

[TABLE]

Since $s_{0}$ minimizes the expression in (176), by Lemma 2.1 we have that

[TABLE]

and thus, from (D.2) we have,

[TABLE]

where

[TABLE]

where $A_{4}$ is from (109) and (138).

By Theorem 2.2 we have that

[TABLE]

with probability at least

[TABLE]

Combining (194) and (195) we have that there is some constant $C_{4}$ , such that

[TABLE]

with probability at least

[TABLE]

∎

Appendix E Proof of Theorem 2.5

In order to show that $\widetilde{\theta}$ converges to $\theta$ (up to the inherent ambiguities), we will show that $\tilde{\theta}_{m}[k]$ (from Step 6 of Algorithm 2) converges to $\hat{\theta}_{m}[k]=|\hat{\theta}[k]|$ and $\widetilde{\theta}_{p}[k]$ (from Step 19 of Algorithm 2) converges to $\alpha\hat{\theta}_{p}[k]e^{2\pi\imath kj/L}$ where $\hat{\theta}_{p}[k]={\hat{\theta}[k]}/{|\hat{\theta}[k]|}$ , for $k=0,\ldots,L-1$ , and for some $j\in\{0,\ldots,L-1\}$ and $\alpha\in\mathbb{C}$ such that $|\alpha|=1$ . Since $\widetilde{\theta}_{m}$ is computed in the same way in Algorithm 2 and Algorithm 1, we already proved in the analysis of Algorithm 1 that (see (190))

[TABLE]

We now show that $\widetilde{\theta}_{p}[k]\underset{\text{a.s.,\;}N\rightarrow\infty}{\longrightarrow}\alpha\hat{\theta}_{p}[k]e^{2\pi\imath kj/L}$ . Define the “clean version” of $\widetilde{C}_{x}$ (from (39)), as

[TABLE]

In (40) we showed that

[TABLE]

As was shown in Appendix C (equation (163))

[TABLE]

Thus, from (199), (200), and (39), it is easy to see that

[TABLE]

Recall that in Step 16 of Algorithm 2, we solve

[TABLE]

where

[TABLE]

Note that,

[TABLE]

where

[TABLE]

Note also that $\beta_{k}$ is uniformly distributed on the unit circle for $k=0,\ldots,L-1$ . Denote

[TABLE]

Since the solution of (202) is the eigenvector corresponding to the leading eigenvalue of $\widetilde{C}_{x}\odot\operatorname{Circul}\left\{\widetilde{\alpha}_{0}^{*}\right\}$ , from (204) by the Davis-Kahan $\sin{\Theta}$ theorem [15], we have that $\widetilde{q}_{1}$ converges to an eigenvector of $H$ .

Next, since the discrete Fourier transform diagonalizes circulant matrices [16], and by simple algebra, it can be shown that

[TABLE]

where $F$ is the $L\times L$ discrete Fourier transform matrix defined in (5). Then, it is evident that

[TABLE]

is a unitary matrix containing the eigenvectors of $H$ , and $F\cdot\left[\beta_{0},\beta_{1},\ldots,\beta_{L-1}\right]^{T}$ are the eigenvalues of $H$ . Since $F$ is a unitary matrix, and $\beta_{0},\ldots,\beta_{L-1}$ are strictly continuous i.i.d. on the unit circle, the eigenvalues of $H$ are distinct with probability $1$ . Then, since $\widetilde{q}_{1}$ from (202) converges to one of the eigenvectors of $H$ , it converges to one of the columns of $V$ up to a constant factor of $\alpha\in\mathbb{C}$ , where $|\alpha|=1$ .

Denote by $V_{j}$ the $j$ ’s column of V. Note the special structure of $V_{j}$ ,

[TABLE]

Thus, $\widetilde{q}_{1}$ converges to $\hat{\theta}_{p}$ almost surely, up to a constant factor and an unknown modulation, or, explicitly,

[TABLE]

for some $\alpha\in\mathbb{C}$ , $|\alpha|=1$ , and some $j\in\{0,\ldots,L-1\}$ . Since $\widetilde{\theta}_{m}$ is a consistent estimator for the magnitudes of $\hat{\theta}$ (see (197)), and $\widetilde{q}_{1}$ is a consistent estimator for the phases (see (208)), by Lemma D.3, the combination of them in Step 20 provides a consistent estimate for $\hat{\theta}$ . Thus, computing the inverse Fourier transform in Step 21 of Algorithm 2 provides a consistent estimate for $\theta$ up to an unknown factor and a cyclic shift.

6 Acknowledgments

We would like to thank Prof. Boaz Nadler and Dr. Tamir Bendory for their remarks and comments. This research was partially supported by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH), by Award Number R01GM090200 from the NIGMS, and by a Fellowship from Jyväskylä University and the Clore Foundation.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Emmanuel Abbe, Tamir Bendory, William Leeb, João M. Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. ar Xiv preprint ar Xiv:1710.02793 , 2017.
2[2] Emmanuel Abbe, João M Pereira, and Amit Singer. Sample complexity of the boolean multireference alignment problem. In Proceedings. IEEE International Symposium on Information Theory , volume 2017, page 1316. NIH Public Access, 2017.
3[3] Emmanuel Abbe, João M. Pereira, and Amit Singer. Estimation in the group action channel. ar Xiv preprint ar Xiv:1801.04366 , 2018.
4[4] Milton Abramowitz and Irene A. Stegun. Handbook of mathematical functions: With formulas, graphs, and mathematical tables applied mathematics series. National Bureau of Standards, Washington, DC , 1964.
5[5] Radosław Adamczak, Alexander E. Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus Mathematique , 349(3):195 – 200, 2011.
6[6] Radosław Adamczak, Alexander E. Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society , 23(2):535–561, 2010.
7[7] Radosław Adamczak, Alexander E. Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus Mathematique , 349(3-4):195–200, 2011.
8[8] Afonso S. Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science , pages 459–470. ACM, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Rank-one Multi-Reference Factor Analysis

Abstract

1 Introduction

2 Method description and main results

2.1 Estimating λ\lambdaλ and the magnitudes of θ^\hat{\theta}θ^

2.2 Estimating the phases of θ^\hat{\theta}θ^

Lemma 2.1**.**

Theorem 2.2**.**

2.2.1 Phase estimation by frequency marching

Theorem 2.3** (Consistency of Algorithm 1).**

Theorem 2.4** (Asymptotic error of Algorithm 1).**

2.2.2 Phase estimation by alternating minimization

Theorem 2.5** (Consistency of Algorithm 2).**

3 Invariant statistics and Trispectrum inversion

4 Numerical Examples

4.1 EM algorithm for the MRFA problem

4.2 Experimental results

5 Summary and discussion

Appendix A Covariance matrix of z(m)z^{(m)}z(m)

Appendix B Error bound for u~(1)\widetilde{u}^{(1)}u(1)

B.1 Preliminary results

Definition 1**.**

Lemma B.1**.**

Proof.

Lemma B.2**.**

Proof.

Definition 2**.**

B.2 Concentration results for sub-exponential random vectors

Proposition B.3**.**

Lemma B.4**.**

Proof.

Proposition B.5**.**

Lemma B.6**.**

Proof.

Lemma B.7**.**

Proof.

B.3 Proof of Lemma 2.1

Lemma B.8**.**

Proof.

Lemma B.9**.**

Proof.

Lemma B.10**.**

Proof.

Proof.

B.4 Proof of Theorem 2.2

Proof.

B.5 Non-Asymptotic error bound for u~(1)\widetilde{u}^{(1)}u(1)

Appendix C Proof of Theorem 2.3

Appendix D Proof of Theorem 2.4

D.1 Supporting lemmas

Lemma D.1**.**

Proof.

Lemma D.2**.**

Proof.

Lemma D.3**.**

Proof.

D.2 Proof of Theorem 2.4

Proof.

Appendix E Proof of Theorem 2.5

6 Acknowledgments

2.1 Estimating $\lambda$ and the magnitudes of $\hat{\theta}$

2.2 Estimating the phases of $\hat{\theta}$

Lemma 2.1.

Theorem 2.2.

Theorem 2.3 (Consistency of Algorithm 1).

Theorem 2.4 (Asymptotic error of Algorithm 1).

Theorem 2.5 (Consistency of Algorithm 2).

Appendix A Covariance matrix of $z^{(m)}$

Appendix B Error bound for $\widetilde{u}^{(1)}$

Definition 1.

Lemma B.1.

Lemma B.2.

Definition 2.

Proposition B.3.

Lemma B.4.

Proposition B.5.

Lemma B.6.

Lemma B.7.

Lemma B.8.

Lemma B.9.

Lemma B.10.

B.5 Non-Asymptotic error bound for $\widetilde{u}^{(1)}$

Lemma D.1.

Lemma D.2.

Lemma D.3.