Rank-one Multi-Reference Factor Analysis
Yariv Aizenbud, Boris Landa, Yoel Shkolnisky

TL;DR
This paper develops methods for accurately estimating signals from noisy, cyclically-shifted observations in low SNR regimes, with proven consistency and improved sample complexity, supported by numerical experiments.
Contribution
It introduces a novel consistent estimation procedure for low SNR, cyclically-shifted data, and demonstrates improved sample complexity through a new algorithm.
Findings
Estimation is possible in low SNR regimes with a sample complexity of 1/SNR^4.
Proposed procedure improves sample complexity by a factor equal to the signal length.
Numerical experiments validate theoretical results and algorithm performance.
Abstract
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is . Additionally, we propose a procedure which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\xpatchcmd
Rank-one Multi-Reference Factor Analysis
Yariv Aizenbud1,# Boris Landa1,# Yoel Shkolnisky1 1School of Mathematical Sciences, Tel Aviv University, Israel
#These authors contributed equally
Abstract
In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is . Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal’s length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings.
1 Introduction
Due to recent improvements in acquisition, storage, and processing capabilities, there is a growing need for techniques aimed at extracting useful information from large datasets. It is commonplace to encounter large datasets of scientific observations, which are corrupted by noise and some deformation (e.g. translations, rotations, etc…) [10, 18, 26]. In many cases, in these large datasets there is a hidden low-dimensional structure which is masked by the deformations and noise.
More formally, we present the following model for the observation :
[TABLE]
where is some random deformation operator, for are unknown deterministic orthonormal signals, is a complex-valued random scale factor with variance , and is a complex-valued noise vector, with being the circularly-symmetric complex normal distribution [20] (intuitively, is equivalent to and ). We assume that the noise variance is known. Given observations from the model (1), our goal is to estimate the signal and its strength .
In this work, we consider a special prototype of the model (1). First, we take the random deformation to be the cyclic shift operator , that is, for any signal and any , we define by
[TABLE]
where is drawn from the uniform distribution over (i.e., is drawn from the random variable satisfying ). We name this estimation problem Multi-Reference Factor Analysis (MRFA), since it can be considered as factor analysis [13] under the cyclic shift . In what follows, we drop the modulus by from all vector indices, as all vectors are considered as periodic. Second, we consider in (1) to be a rank one signal (). Formally, in the above notation, we consider the model:
[TABLE]
We name the problem of estimating and from observations generated from the model (3) rank-one MRFA. Specifically, given independent observations from the model (3), where , our goal is to recover and . We note that the random shift in (3) is a nuisance parameter, and estimating its realizations is of no interest in our model. Note that in the model (3), the signal may be estimated only up to an arbitrary cyclic shift and a product with a complex number of modulus one (global phase). Even though our analysis is focused on the case of , all of our statements can be easily adapted to the more general setting where admits an arbitrary distribution (either real or complex) with and a bounded fourth moment. The algorithms derived in this work are also suitable for solving the rank-one MRFA problem in this more general setting.
Note that if the random factor in (3) is replaced by a constant, then the model (3) reduces to Multi-Reference Alignment (MRA) [2, 8, 9, 24], which has recently drawn much interest. Both the MRA model and the model (3) provide a simplified model for various problems in science and engineering, particularly in areas such as communications, radar, image processing, and structural biology [10, 18, 25, 26, 28, 35].
Having mentioned that our problem is a generalization of MRA, it is worthwhile to discuss the latter’s possible solutions for different levels of noise. When the noise variance is small, it is known that MRA can be solved by first estimating the relative shift between any two observations (taking the shift that maximizes the correlation between the two observations), then, aligning all of the observations, and finally averaging the aligned observations [9]. Such a procedure is known to result in a sample complexity (the number of samples required for a prescribed error in estimating ) of . However, this approach fails when the noise variance is large, as pairwise correlations become meaningless, and thus an alignment of the observations cannot be achieved. In the high noise regime, several algorithms were proposed in [1, 9, 12, 24], and were demonstrated to achieve the optimal asymptotic sample complexity of [24] in the case of uniformly-distributed cyclic shifts, and of [1] in the case of an arbitrary aperiodic distribution of the shifts. Indeed, such methods do not attempt to estimate the shifts of the observations (a task doomed to fail when the noise variance is large [24]), but rather use all observations to estimate a sufficient number of shift-invariant statistics, from which the underlying signal can be recovered. It is shown [9] that for the general case of the MRA problem, it is possible to recover the underlying signal using only the first three shift-invariant statistics: the mean (first order), the power spectrum (second order), and the bispectrum (third order) [23], where the latter governs the achieved sample complexity. The distinction between the two noise level regimes, high and low, applies analogously for samples generated by the model (3). In the low noise regime, in (3) can be estimated by aligning the observations using correlations, followed by calculating the rank-one factorization of the aligned samples. Therefore, when studying the model (3), the regime of interest is the one of large noise variance.
A different approach for MRA, or more generally, for estimating model parameters in the presence of nuisance parameters is Maximum-Likelihood Estimation (MLE) via Expectation-Maximization (EM) [17], which marginalizes over the nuisance parameters.
Typically, and in particular in the context of MRA, EM suffers from two major shortcomings. The first is the lack of theoretical convergence guarantees (except in some special cases [34]), and the second is the phenomenon of extremely slow convergence – resulting in very long running time, particularly when the noise variance is large [9]. In contrast, estimators based on invariant statistics typically enjoy rigorous error bounds on one hand, and faster running times on the other (as they are essentially single-pass algorithms – going over every observation only once).
Going back to our model of rank-one MRFA (3), and motivated by the above discussion, we consider the following question: Can we accurately estimate and of (3) in the regime of large and large ? We answer this question affirmatively, and moreover, propose an algorithm which, under mild conditions, is guaranteed to recover and (up to the ambiguities of global phase and cyclic shift) when . The asymptotic sample complexity of our proposed algorithm is (for large ). As a by-product, we develop new concentration results for non-i.i.d. sub-exponential random vectors (and their sample covariance matrices), when the vectors admit a certain underlying structure (see Appendix B).
While the optimal asymptotic sample complexity of MRA is , the asymptotic complexity of our algorithm for MRFA is . The reason for the different rates is that the rank-one MRFA is fundamentally different from MRA, as the third-order invariant statistic, the bispectrum (given by certain triplet correlations in the Fourier domain – see Section 3), which is used to solve MRA, vanishes under the setting of (3). Therefore, we resort to a fourth-order shift-invariant statistic, known as the trispectrum (given by certain quadruplet correlations in the Fourier domain), and hence the dependence on instead of . Since the bispectrum vanishes entirely, we believe it is unlikely that a rate better than can be achieved (see [3] for this type of argument in the case of MRA).
Note that the previously-mentioned sample complexities are oblivious to the signal’s length , as they assume is a fixed constant. In practice, the length of the signal affects the sample complexity, and thus, we do not neglect it in our analysis. We analyze the sample complexity of our algorithms in terms of the signal’s length , and in terms of the signal-to-noise ratio (SNR), defined as
[TABLE]
We present in this work two algorithm that solve the rank-one MRFA problem. We show that the sample complexity of the first algorithm satisfies (in the regime of large ), where is the number of required measurements for estimating and to a given accuracy. We observed numerically that the achieved sample complexity of this algorithm is actually better by a factor of , satisfying . The second algorithm is a variant of the first one, which works better in practice at the cost of weaker theoretical bounds. It is observed to provide a further improvement by a factor of , resulting in a sample complexity of .
We demonstrate all our algorithms numerically, and show the agreement between the theory and the experimental results. We also compare our algorithms with the EM algorithm (designed for the rank-one MRFA problem) and show that it provides the same sample complexity as our methods (as a function of ) with a marginal gain in the estimation error, while suffering from extremely slow running times and lack of theoretical guarantees.
The paper is organized as follows. Section 2 describes our algorithms for recovering and of (3), and provides the main theoretical guarantees for their performance. Section 3 connects our approach with that of shift-invariant statistics. Section 4 presents the numerical experiments supporting our theoretical derivations. Finally, Section 5 provides some concluding remarks and some possible future research directions.
2 Method description and main results
In this section, we describe our methods for estimating the model parameters and of (3), and provide their theoretical error bounds and sample complexities.
Instead of working with the model (3) directly, we consider an equivalent, more convenient formulation in the Fourier domain, where cyclic shifts are replaced by modulations. Let be the unitary Discrete Fourier Transform (DFT) matrix
[TABLE]
and denote the Fourier transforms of the quantities in (3) by
[TABLE]
Then, a formulation equivalent to (3) in the Fourier domain is
[TABLE]
for , where and (since is unitary). In what follows, we consider the problem of estimating and from the observations (), recalling that can always be obtained from by the inverse Fourier transform, i.e.
[TABLE]
In the rest of this section, we describe our methods for estimating the signal parameters and . We start by describing how to estimate the signal’s strength together with the magnitudes of (i.e. ) from the power spectrum of the observations. Then, we detail two methods for estimating the phases of (i.e. ). The first method is simpler to implement and enjoys faster running times. The second method is able to exploit more information from the statistics calculated from the observations and results in lower estimation errors. We show that both methods are statistically consistent in estimating and (or, equivalently, ) as , up to the inherent ambiguities for – global phase and cyclic shift. We also show that in the low SNR regime (large ), the first method admits a sample complexity of . Later, in Section 4, we demonstrate numerically that the first method actually achieves a better sample complexity of . The second method further improves upon this rate by a factor of , with .
2.1 Estimating and the magnitudes of
Consider the power spectrum of the random signal from (3), given by
[TABLE]
and note that since , encodes both and the magnitudes of (i.e. ) via
[TABLE]
The power spectrum , and correspondingly , can be estimated from by
[TABLE]
which are consistent estimators (as ) for and , respectively. Furthermore, satisfies [33]
[TABLE]
where we discarded lower-order terms of (since we are interested in the regime of ), and therefore
[TABLE]
Hence, the sample complexity of estimating and the magnitudes of from and is in the regime of large noise variance.
2.2 Estimating the phases of
Next, we proceed to estimate the phases of the signal , that is, the vector . Consider the vectors , for , defined by
[TABLE]
were is the complex conjugate of . Essentially, each vector consists of the products between the elements of with stride , and thus encodes the phases of through the relation
[TABLE]
up to an integer multiple of . That is, the vectors , , describe all pairwise differences between the phases of the elements of . We mention that does not provide any useful phase information (as it is equivalent to the power spectrum), and will play no role in what follows.
Before we describe how to extract the phases of from the vectors , we present a method for estimating from the observations . We define the stride- products of the elements of of (7) by
[TABLE]
and observe that by (7) we have
[TABLE]
where is a noise term given by
[TABLE]
Note that if we had no noise, i.e. (and hence ), then different realizations of would be equal to up to constant factors (as is independent of the frequency index ). We define the covariance matrix of as
[TABLE]
and observe that in the case of no noise (), would be of rank one, with its leading eigenvector equal to (again – up to a constant factor of known magnitude). We therefore proceed by computing the sample covariance matrices of
[TABLE]
and correct for the effect of the noise (see Appendix A) by modifying the main diagonal of each according to
[TABLE]
where is the estimate of the power spectrum of from (11). Essentially, (21) corrects for the bias in caused by the noise term . In Section 3 below, we relate the sequence of matrices , , of (19) with the trispectrum of (for the definition of the trispectrum see also Section 3).
Once has been computed (for any ), we take its leading normalized eigenvector (corresponding to the largest eigenvalue of ) as an estimate for . Note that the subtraction of in (21) has no effect on the eigenvectors of , and therefore has no effect on our estimate for .
Note that by the definition of (see also (15)), it satisfies
[TABLE]
whereas does not necessarily satisfy (22), since estimates up to an arbitrary phase. We can easily update to satisfy by
[TABLE]
Note that after the update (23), is unique up to a multiplication by for . One approach to try and solve this ambiguity is to multiply by a phase, such that . Unfortunately, this will violate requirement . We fix this ambiguity while satisfying (22) by choosing such that is minimal, and then we update again by
[TABLE]
Next, we characterize the accuracy of the estimate in the low SNR regime. For the sake of clarity and simplicity of notation in the proof, the result is stated in Lemma 2.1 and Theorem 2.2 for the case of .
Lemma 2.1**.**
Let be given by (14), let be the leading eigenvector of of (21), and define and . If and is large enough, then the error in the estimate of can be bounded by
[TABLE]
where and are independent of , for some .
The proof is provided in Appendix B.3.
Theorem 2.2**.**
Let be given by (14), let be the leading eigenvector of of (21), and define and . Then, if and is large enough, there exist constants (independent of and ), such that for from Lemma 2.1
[TABLE]
with probability at least
[TABLE]
The proof is provided in Appendix B.4. Note that the probability in (26) tends to rapidly as (dominated by an exponential rate).
In essence, Lemma 2.1 states that the accuracy of the estimate can be bounded by a degree-4 polynomial in . In the large noise regime (), the dominant part of the error in the estimate of is , and thus we have Theorem 2.2 that bounds with high probability. We argue in Appendix B.5 that with high probability, in the regime of large and large
[TABLE]
where SNR is defined in (4) and is defined analogously to by . In other words, in the regime of low SNR, the number of samples required to estimate by to a constant prescribed error is
[TABLE]
Next, we show how to estimate the phases of using the estimates .
2.2.1 Phase estimation by frequency marching
Note that of (14) is sufficient to recover the phases of , since (up to an integer multiple of )
[TABLE]
Therefore, the phases of can be obtained by first initializing the phase of to zero (recalling that we have an ambiguity of a global phase and a cyclic shift, so we can initialize the phase of arbitrarily), followed by applying (29) repeatedly for . We refer to this procedure as frequency marching (FM) (see [9] for a similar technique baring the same name). Using (the leading eigenvector of of (21)) as an estimate of , the explicit formula for the estimated signal (combining the amplitude given by (10) with the phases given by (29)) is given by
[TABLE]
where denotes element-wise multiplication (the Hadamard product), is the magnitude part (given by (10))
[TABLE]
and is the phase part (given by (29))
[TABLE]
The algorithm for estimating the signal parameters and using frequency marching is summarized in Algorithm 1.
Algorithm 1 clearly does not exploit all the available information. While the method presented in the next section exploits more complicated statistics of the observations and provides lower estimation errors, Algorithm 1 is simpler to implement and enjoys faster running times (then the algorithm of the next section). For Algorithm 1 (unlike the method of the next section) we show that in the low SNR regime it admits a sample complexity of .
The next theorem establishes that Algorithm 1 results in a consistent estimate for as , up to the ambiguities of a phase factor and a cyclic shift.
Theorem 2.3** (Consistency of Algorithm 1).**
Let be from the model (3) and suppose that for all . Let be from Algorithm 1. Then, there exists an integer and a constant satisfying , such that
[TABLE]
where is a cyclic shift by .
The proof is provided in Appendix C.
Additionally, we have the following theorem, which characterizes the asymptotic estimation error of Algorithm 1 in the low SNR regime.
Theorem 2.4** (Asymptotic error of Algorithm 1).**
Let be from the model (3) and be from Algorithm 1. Define and where is given by (14). If and is large enough, there exists an integer , and a constant satisfying , such that
[TABLE]
where are independent of . Additionally, there exist constants (independent of and ), such that
[TABLE]
with probability at least
[TABLE]
The proof is provided in Appendix D. By bounding , and in (34) along the lines of Appendix B.5, we get from Theorem 2.4 that for large and , with high probability we have
[TABLE]
or, that in the regime of low SNR, the number of samples required by Algorithm 1 to estimate to a constant prescribed error (the sample complexity) is
[TABLE]
In Section 4, we demonstrate numerically that the sample complexity achieved by Algorithm 1 is actually better than (38) by a factor of , i.e. .
2.2.2 Phase estimation by alternating minimization
Next, we propose a method for estimating the phases of using all vectors , , simultaneously. To that end, the main observation is that the vector of (14) is the ’th diagonal of . Therefore, we can use all vectors to construct , followed by estimating from the leading eigenvector of . In practice, as we only have access to the estimated vectors , which estimate up to unknown phase factors, we can only approximate the matrix up to an element-wise product with a circulant matrix (of the unknown phases). Therefore, we propose to estimate and the unknown phase factors in simultaneously. As we try to obtain the phases of , we will use only the phases of throughout this procedure.
We start by forming the matrix as
[TABLE]
That is, has ones on its main diagonal, and the phases of the vector on its ’th diagonal (with circulant wrapping for each diagonal). Note that by the structure of in (39) and by the definition of , the matrix is self-adjoint. Thus, if we have no noise, i.e , it follows that can be written as the element-wise product of a rank-one matrix, and a circulant matrix of phases (since estimates up to an unknown constant phase factor). Specifically, if we have
[TABLE]
where is the phase part of , i.e. , denotes element-wise multiplication (the Hadamard product), and is a circulant matrix constructed from a vector of phases , i.e.
[TABLE]
Motivated by the observation in (40) for the case of no noise (), we propose to recover by solving
[TABLE]
Now, since (42) is non-convex, we propose to solve it by alternating minimization, where we alternate between solving for and solving for . When is held fixed, the solution for is given simply by the singular value decomposition (SVD) of the matrix . That is,
[TABLE]
where is the largest singular value of , and is its corresponding singular vector (left and right singular vectors are equal up to a sign since the matrix is self-adjoint). When is held fixed, the optimization problem for reduces to a least-squares problem with absolute-value constraints, whose solution is given explicitly by
[TABLE]
where all matrix and vector index assignments in (44) are modulo . We alternate between computing the solutions for and for iteratively until the objective in (42) reaches saturation. We summarize the algorithm for estimating and of (3) using alternating minimization in Algorithm 2.
The following theorem states that even a single iteration of the alternating minimization in Algorithm 2 lines 15-18 (i.e. ) results in a consistent estimate for as (up to the inherent ambiguities of phase and cyclic shift).
Theorem 2.5** (Consistency of Algorithm 2).**
Let be from the model (3), be from Algorithm 2 with , and suppose that for all . Then, there exists an integer and a constant satisfying , such that
[TABLE]
where is a cyclic shift by .
The proof is provided in Appendix E.
In terms of sample complexity, since Algorithm 2 estimates the phases of using all vectors simultaneously, we expect it to perform better than Algorithm 1, which only uses . In particular, the alternating minimization in Algorithm 2 estimates unknown parameters ( parameters for , and for the nuisance phases ) from measurements (the matrix ), whereas Algorithm 1 estimates parameters (the phases of ) from only measurements (the vector ). Even though the error vectors (of (18)) are not independent for different , it is easy to verify that they are uncorrelated, and therefore, it is reasonable to expect that the error in estimating by Algorithm 2 would improve upon that of Algorithm 1 (since we use more information). Indeed, we demonstrate numerically in Section 4 that estimating by Algorithm 2 achieves the sample complexity of (27), i.e.,
[TABLE]
which improves upon the sample complexity of Algorithm 1 by a factor of .
3 Invariant statistics and Trispectrum inversion
While the previous section provides a self-contained derivation of our algorithms, there are several advantages to present a more systematic approach to the MRFA problem. First, it is worthwhile to relate our algorithms with the method of moments (see [33], and the generalized method of moments [21]) used in previous works (see in particular works related to MRA [8, 9, 24]). Second, this section also explains the methodology behind the derivation of our algorithms, and some of the fundamental limitations related to the sample complexity of these algorithms. Third, the presentation in this section establishes that the presented algorithms can be interpreted as methods for recovering a signal from its second and forth moments (the trispectrum in particular). The trispectrum is currently used, for example, in cosmology [22, 11] and signal processing [14], but currently there is no known algorithm for its robust inversion.
In what follows, we introduce the concept of shift-invariant statistics, and show its connection to the approach of Section 2 by demonstrating that both of our algorithms for recovering the phases of the unknown signal makes use of the first two non-vanishing invariant statistic.
Consider the moments of the dimensional random vector up to order four
[TABLE]
It is easy to verify that for satisfying the model (3), the following holds:
The first moment is equal to zero for any . In particular, the mean of the samples of satisfies
[TABLE] 2. 2.
The second moment is equal to zero for . For , the second moment is just the power spectrum of , that is
[TABLE] 3. 3.
The third moment is equal to zero for any and . In particular, the bispectrum of vanishes,
[TABLE]
The bispectrum is a third moment of (the expected value of the product of at three different indices), and it vanishes since admits a symmetric distribution around [math], owing to the random factor . Specifically, the factor in the bispectrum is taken to the third power, which results in an expected value equal to zero. 4. 4.
The forth moment is equal to zero for any . The value of for the indices satisfying is known as the trispectrum [14],
[TABLE]
It is easy to verify that the entries of , , and are invariant to cyclic shifts of (or equivalently, to modulations of ), regardless of the distribution of the shifts, and hence serve as shift-invariant statistics of . When using the model (7) to compute (52), it follows that
[TABLE]
Since and vanish, among the first three invariant statistics , and , only the power spectrum provides us with useful information, which is used to estimate and the signal magnitudes as described in Section 2.1.
While the magnitudes of can be estimated from the power spectrum (55), the latter is insufficient to determine entirely, as we also need to estimate the phases of . Therefore, we need to use the fourth-order moment, and particularly, the trispectrum of . Thus, we next point out the relation between the trispectrum and the methods described in Section 2. Consider the random vector and its covariance matrix , given by
[TABLE]
where are from (16). Note that can also be viewed as the elements of the matrix reorganized into a vector (i.e. all products of the elements in ). Then, the covariance matrix can be expressed as
[TABLE]
where enumerates over blocks of size in . It follows that is related to the fourth moment of (50) via
[TABLE]
Therefore, the matrix encodes the same information as . Note that
[TABLE]
and thus, is a block-diagonal matrix with its non-zero blocks given by
[TABLE]
for (see (19)), thereby establishing that the sequence of matrices , which are estimated in Step 9 of Algorithm 2 ( is also estimated in Step 7 of Algorithm 1) is equivalent to the trispectrum . This relation between the trispectrum and shows that Algorithms 1 and 2, starting from Steps 9 and 13 respectively, are, in fact, estimating the signal from its trispectrum, or, in other words, Algorithms 1 and 2 preform trispectrum inversion.
In [3] it is shown that the sample complexity of any algorithm that recovers the underlying signal from its noisy measurements on a group action channel (in our case the group action is shifting the signal) is bounded from below by , where is the smallest integer such that all the moments up to order define the signal uniquely (up to the ambiguities of the problem). For the case of the MRA problem this moment is the bispectrum (). For the rank-one MRFA problem, the trispectrum is the first shift-invariant moment carrying sufficient information for recovering (and thus ), and therefore in this case .
4 Numerical Examples
In this section, we demonstrate the performance of Algorithms 1 and 2 (see Section 2) by numerical simulations, and show that their performance agrees with the theoretical results. The error in all the experiments is measured as
[TABLE]
We also compare our algorithms with the Expectation-Maximization (EM) algorithm [17], which is a popular approach for Maximum-Likelihood Estimation (MLE) in the presence of nuisance parameters. We start this section with a short explanation of the EM algorithm adapted to our model (3).
4.1 EM algorithm for the MRFA problem
The EM algorithm is a classical heuristic approach that tries to optimize model parameters by maximizing the likelihood of the observations given the model parameters. This approach is widely used in many applications [27, 29]. The EM algorithm iterates over two steps: The Expectation step (E-step) and the Maximization step (M-step). An elaborated description of the EM algorithm for the MRA problem can be found in [9]. In the case of the rank-one MRFA (3), the EM algorithm is initialized with parameter estimates and , and then iterates the E-step and M-step given below until convergence:
E-Step:
At iteration , given current parameter estimates and , estimate the likelihood of the observation assuming shift . Under the model in (3), the likelihood is
[TABLE]
M-Step:
Find and that minimize the expression
[TABLE]
4.2 Experimental results
We first demonstrate the performance of Algorithms 1 and 2 and the EM algorithm for random complex signals of length 16 (), for a number of samples () between and , and for SNR (as defined in (4)) values between and . The signals are generated as follows. First, we generate a random complex Gaussian signal of length , and normalize its power spectrum to be a vector of ones. The power spectrum normalization ensures that and (as defined in Theorem 2.4) are the same in all experiments. Next, the observations are generated from the clean signal by multiplying it by a normally distributed complex factor with variance , then shifting it by a uniformly distributed shift and adding to the resulting vector Gaussian noise with variance (derived from the SNR).
The results of the different algorithms are shown in Figure 1, where the intensity of each pixel describes the accuracy of the considered algorithm for the corresponding and SNR (blue/dark is high accuracy and yellow/light is low accuracy). In Figures 1(a) and 1(b) we demonstrate the performance of Algorithms 1 and 2 respectively. In Figures 1(c) and 1(d) we demonstrate the performance of the EM algorithm, where in Figure 1(c) the EM algorithm is initialized with the correct signal (“Oracle initialization”) and in Figure 1(d) the EM algorithm is initialized with a random vector. Although Figures 1(a) and 1(b) look similar, Algorithm 2 “breaks” at lower SNRs for any given sample size (it is mostly visible when comparing Figures 1(a) and 1(b) in the region where and ).
We have shown in (38) that should asymptotically be proportional to , or, on a logarithmic scale, should depend linearly on . It can be noticed that the phase transition in Figures 1(a) and 1(b) is indeed linear. In Figure 2, we show the same heat-map as in Figure 1(b) together with a black line that is the solution to the equation . Notice how the transition region is aligned with the solid black line.
We show the performance of Algorithm 1, Algorithm 2 and the EM algorithm with oracle/ random initialization for and for different SNR values in Figure 3. Although the accuracy of the EM algorithm is higher for high SNRs, for low SNRs (starting from around ) the EM algorithm gives less accurate results. Additionally, as opposed to the constant running time of Algorithms 1 and 2, the EM algorithm takes much longer for low SNR values.
We show next, in Figure 4, the performance of Algorithm 1, Algorithm 2 and the EM algorithm with oracle/ random initialization for and for different values of . It can be noticed from Figure 4(b) that Algorithm 2 is slower than Algorithm 1 by approximately a constant time for all . The computationally expensive step of Algorithm 2 is the “alternating minimization” step, whose complexity does not depend on .
Finally, we show in Figure 5 a heat-map of the accuracy of Algorithm 2 for , with SNR values between and (on the -axis, on a log scale) and for values of between and (on the -axis, on a log scale). It can be noticed that, as expected, the dependency between and SNR is linear.
5 Summary and discussion
We presented two statistically consistent algorithms for solving the rank-one MRFA problem. One algorithm (Algorithm 1) has proven non-asymptotic performance bounds, and the other (Algorithm 2) has better performance in practice. We compared the performance of the two algorithms to the EM algorithm, and showed the superiority of Algorithms 1 and 2 in “difficult” regimes. Intuitively, Algorithm 2 uses more information than Algorithm 1, by taking advantage of the entire trispectrum, and thus gives more accurate results. We also note that even though our model (3) considers the case of uniform distribution for the shifts (which is more challenging in terms of estimation accuracy [1]), the algorithms presented in this work can handle any distribution.
Although Algorithm 2 uses more information than Algorithm 1, we did not show that Algorithm 2 is optimal. Thus, there might be a better way for solving the MRFA problem and, equivalently, for inverting the trispectrum.
As of future research, an interesting direction is to extend the rank-one MRFA model to a general rank- MRFA model.
Appendix A Covariance matrix of
In this section, we show that of (19) is given by
[TABLE]
where is a diagonal bias term given by
[TABLE]
for any , where is the power spectrum of (see (9)). Thus, we establish that the matrix is of rank one, with its leading eigenvector equal to up to a constant factor.
Recall that (see (17))
[TABLE]
where is given by (see (18))
[TABLE]
and is defined in (7). From (19), (66), (67) we get that
[TABLE]
Since is complex-valued, we have that . Additionally, when substituting (67) in (68), together with the fact that , it follows that for
[TABLE]
for every and , since and are uncorrelated for every (and ), and independent of the factor . Next, we have
[TABLE]
Further manipulation of (71) using the properties of the model (7) results in
[TABLE]
Lastly, formula (9) for the power spectrum gives (65).
Appendix B Error bound for
B.1 Preliminary results
We first recall the definition of a sub-exponential random variable (see [30, 31, 32]).
Definition 1**.**
A random variable is called sub-exponential if there exists a constant such that
[TABLE]
for all .
Lemma B.1**.**
Let , and , be i.i.d. sub-exponential random variables with zero mean. Then, there exist constants such that
[TABLE]
with probability at least
[TABLE]
Proof.
Fixing , are i.i.d. sub-exponential random variables, and by Proposition 5.16 in [30], there exist constants such that
[TABLE]
Therefore, by the union bound we have that
[TABLE]
Finally, substituting concludes the proof. ∎
The following lemma relates the concentration of two non-negative random variables to the concentration of their sum.
Lemma B.2**.**
Let and be two (possibly dependent) random variables. Then, it holds that
[TABLE]
Proof.
Note that
[TABLE]
Since in the expression , is smaller or equal to , we have that implies that , and thus, . ∎
To prove Theorem 2.1 we will use the following definition.
Definition 2**.**
Let be a vector of identically distributed sub-exponential random variables with zero mean. Suppose that for any , depends only on and . Then, we call “piecewise-i.i.d.”(identically distributed, piecewise independent).
B.2 Concentration results for sub-exponential random vectors
In this section, we show that a ”piecewise i.i.d.” vector , defined in Definition 2, admits some concentration properties related to sub-exponential random vectors. The main lemma of this section is Lemma B.7.
First, we state the following result from [31].
Proposition B.3**.**
[Proposition 5.16 in [31]] Let , where are independent sub-exponential random variables with zero mean. Then, there exists a constant , such that for every and every
[TABLE]
We mention that we dropped the sub-Gaussian tail from the original statement of the proposition in [31] (and thus the bound above is weaker then the original bound) as it can be bounded by the sub-exponential tail, which is the one of interest for our purposes.
It follows from Proposition B.3 that the random variable is uniformly sub-exponential (i.e., bounded by the same decay rate) for all vectors with bounded (e.g., for ). We now prove that a similar property holds for a piecewise i.i.d. vector , even though its elements are not independent. To prove such a property, we use Lemma B.2 to decompose the vector into vectors, each with independent entries. This is stated and proved in Lemma B.4.
Lemma B.4**.**
Let be “piecewise-i.i.d.” as in Definition 2. Then, there exists a constant , depending on the distribution of the entries of , such that for every (i.e. ) and every
[TABLE]
Proof.
Let us assume for simplicity that is even, and define the four vectors
[TABLE]
By the definition of (see Definition 2), it follows that the elements in each of , , , are independent and sub-exponential. For every vector , we reorganize it into four vectors analogously to , and get using the triangle inequality that
[TABLE]
Then, using Lemma B.2 twice, we get that
[TABLE]
Lastly, we apply Proposition B.3 to each of and get that
[TABLE]
where we used the fact that for . Clearly, the constant is of no real significance, and can be replaced by together with an appropriate change in the exponent for any . Since, in addition, for we have that
[TABLE]
we can pick a suitable such that
[TABLE]
for all . This concludes the proof for even values of . For odd values of , one can repeat the proof with a slightly different decomposition of the vector . ∎
Next, we derive a concentration result for the norm . We first state the following large deviation result for vectors of independent sub-exponential random variables (Lemma 8.3 in [19], slightly reformulated, and stated for , which corresponds to our definition of sub-exponential variables, and with the choice ).
Proposition B.5**.**
[19]** Let , where are independent sub-exponential random variables with mean zero and variance . Then, there exist constants such that
[TABLE]
Proposition B.5 essentially states that norms of vectors consisting of independent sub-exponential random variables cannot be too large, as they are well concentrated around their means. Now, we shall prove the same result (with different constants) for our “piecewise-i.i.d.” vectors , even though their elements are not independent.
Lemma B.6**.**
Let be “piecewise-i.i.d.” as in Definition 2, with . Then, there exist constants , such that
[TABLE]
Proof.
The proof follows along the same lines as the proof of Lemma B.4. Suppose that is even, and define the vectors as in (77). Then,
[TABLE]
By, using Lemma B.2 twice, we get
[TABLE]
and by substituting and applying Proposition B.5 (observing that the elements in satisfy the required conditions) we have that
[TABLE]
when using , . This concludes the proof for even values of . For odd values of , the proof can be repeated with a slightly different partitioning of . ∎
Next, using the concentration results of Lemma B.4 and Lemma B.6, we are able to use the results of [7] to bound the norm of the sample covariance matrix of . This is the subject of the next lemma.
Lemma B.7**.**
Let be i.i.d. samples of the random vector which is “piecewise-i.i.d.” as in Definition 2, and assume that is large enough. Assume further that . Then, there exist constants such that
[TABLE]
where the constant is from Lemma B.6.
Proof.
At the heart of this proof is Theorem 1 from [5] (see also Corollary 1 from [5]) which requires two conditions on the random vectors (equations (2) and (3) in [5]). The first condition is that for some ,
[TABLE]
where the -norm of a random variable is defined as
[TABLE]
The -norm is a characterization of a sub-exponential random variable (see [30] for the -norm of sub-exponential random variables). As we have shown in Lemma B.4, each random variable is sub-exponential uniformly over all (that is, bounded by the same decay rate for all ). Therefore, from the definition of a sub-exponential random variable, and as are identically distributed sub-exponential random variables, we have by Lemma 2.3 of [6] that
[TABLE]
for all , and for and some . Hence, the condition (78) is satisfied.
The second condition required by Theorem 1 in [5] is that there exists a constant such that
[TABLE]
Note that (79) does not follow immediately from Lemma B.6. Therefore, we instead define restricted random vectors which satisfy the boundedness condition (79) (as well as condition (78)), and are equivalent to the original vectors with high probability. Consider the random vector
[TABLE]
where denotes the vector of zeros. Note that for two random variables and , if , then . Clearly, the first condition (78) is satisfied for since . The second condition (79) is also satisfied due to the definition of .
Up to this point, we showed that Theorem 1 in [5] holds for the vectors ( i.i.d. samples of ). Explicitly, there exist constants such that
[TABLE]
In what follows, we evaluate the quantity for (which takes part in the bound (80)), and show that it is sufficiently close to . Let us write
[TABLE]
where in the last inequality we used Cauchy-Schwarz inequality and the fact that . Consider the random variable \chi=\left\|\widetilde{\zeta}\right\|^{2}\bigg{|}\left\{\frac{\left\|\widetilde{\zeta}\right\|^{2}}{2L}>K\left(\frac{N}{2L}\right)^{1/2}\right\}, whose probability density function is
[TABLE]
where is the normalization for the density of
[TABLE]
Then,
[TABLE]
Note that for any non-negative continuous random variable
[TABLE]
where the last equality is due to interchanging the order of integration. Now, using (84), we have
[TABLE]
where we substituted (or ) in the bound of Lemma B.6. Note that according to Lemma B.6 (by substituting and after some manipulation)
[TABLE]
where . Overall, by (83), (86), and (87) we have that
[TABLE]
Note the the term decays exponentially with , and therefore, it is clear that for a large enough
[TABLE]
for some constant , which can be chosen arbitrarily small. Also, using a standard integration formula [4], we have that
[TABLE]
with , and thus, it also follows that, for a large enough ,
[TABLE]
for some constant , which can be chosen arbitrarily small. Next, by plugging (90) and (89) in (88) and using the result in (81) we have
[TABLE]
where the second inequality is due to the definition of , and the last equality is since , from which it follows that
[TABLE]
where is the th element of .
Note that for any and a small enough , if then
[TABLE]
From (91) and (92), we have that
[TABLE]
Therefore, rewriting (80) using (93) we have
[TABLE]
We absorb the term into , and we have that, for some constant ,
[TABLE]
Equation (94) is the result of applying Theorem 1 in [5] to the truncated vectors . Next, we adapt this result to the original vectors . Using (82), (87), and the union bound, we have that
[TABLE]
Denote the event by , and its compliment by . We have that
[TABLE]
and
[TABLE]
and therefore, using the law of total probability, by combining (95) and (96), we have
[TABLE]
Noting that
[TABLE]
we rewrite (97) as
[TABLE]
which concludes the proof.
∎
Essentially, Lemma B.7 establishes the concentration of the sample covariance matrix of the random vector around the population covariance matrix (which is ).
B.3 Proof of Lemma 2.1
We begin with Lemma B.8 that will help us prove Lemma 2.1.
Lemma B.8**.**
Let
[TABLE]
where is defined in (14), and let be given by (20) and (21). Then, can be written as
[TABLE]
where and are independent of .
Proof.
We define the random vector such that
[TABLE]
where is the noise vector from the model (7), and therefore
[TABLE]
Then, of (17) can be written as
[TABLE]
where we defined the random vector via
[TABLE]
We have from (20) and (21) that
[TABLE]
where are i.i.d. samples of , and is the estimated bias matrix (as in (65) except that is replaced with its estimate )
[TABLE]
recalling from (11) and (7) that
[TABLE]
Since , from (104) and (105), we have
[TABLE]
Next, we substitute (101) into (103)
[TABLE]
where (recalling that )
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
with
[TABLE]
and defined in (102). From the definition of in (99), defining
[TABLE]
concludes the proof. ∎
We will also need the following two supporting lemmas.
Lemma B.9**.**
Let such that for . Let such that and , then
[TABLE]
Proof.
Denote, for any , . Then for some . In this notation,
[TABLE]
or,
[TABLE]
It is easy to verify that the that maximizes in (113) is such that . Thus, since is monotonously increasing, , and , we have
[TABLE]
Since (115) holds for all , we have that
[TABLE]
∎
Lemma B.10**.**
Let , with , and denote such that and . Denote , and assume that and . Then,
[TABLE]
Proof.
Denote such that and for . From the definition of and the reverse triangle inequality we have
[TABLE]
Since and we have that,
[TABLE]
or, together with (118), we have
[TABLE]
and thus
[TABLE]
Note that for any and we have that
[TABLE]
Using the triangle inequality and (121), we have
[TABLE]
Combining (122) with (120) and the fact that , we get
[TABLE]
where the second inequality is due to the assumption and the last inequality holds since .
∎
Next we prove Lemma 2.1.
Proof.
We have from Lemma B.8 that
[TABLE]
and so,
[TABLE]
Now, recall from (99) that is a rank one matrix whose leading eigenvalue is . Due to the assumption in Lemma 2.1,
[TABLE]
Now, since is the leading eigenvector of , then by the Davis-Kahan theorem [15] (noting that the spectral gap of is equal to its largest eigenvalue since it is a rank one matrix), using (124), it follows that
[TABLE]
where is some complex constant satisfying .
Next we show that there is some such that
[TABLE]
Denote such that and . First, suppose that,
[TABLE]
(the other case will be handled below). Then, by Lemma B.10 we have,
[TABLE]
Denote such that and , and note that , where . Then by Lemma B.9 and by (127),
[TABLE]
where is a vector of ones of length . Thus, if we write , where than,
[TABLE]
Summing over both sides of we get,
[TABLE]
Recall from (22) that and from (23) that , and thus from (130) we have
[TABLE]
for some . In other words, we showed that , the phase of of (125) is ”not too far” from an ’th root of one. Now, note that,
[TABLE]
Since and from (129), we have
[TABLE]
Since , from (132) and (133) we have that
[TABLE]
in case .
Suppose now that . Since and , we have
[TABLE]
Combining (134) and (135) we have,
[TABLE]
for all values of . From (136) and (125) we have
[TABLE]
Combining (137) with (123), and denoting
[TABLE]
concludes the proof.
∎
B.4 Proof of Theorem 2.2
Proof.
Recall from (109) that
[TABLE]
with
[TABLE]
for , and so
[TABLE]
As for the terms and in (139), since and are diagonal matrices, we have
[TABLE]
for , and therefore, it is also clear that
[TABLE]
Since are i.i.d. standard Gaussian random variables, the random variable has chi-squared distribution (which is sub-exponential) with expected value of , and so is sub-exponential with zero mean. Thus, we apply Lemma B.1, and get that
[TABLE]
for some constants . Since we have that
[TABLE]
Next, we consider
[TABLE]
from (139). Requiring that (142) is small is essentially a concentration result for the sample covariance of the random vector . We mention that since each element of is the product of two Gaussian random variables (see (102)), the elements of admit a sub-exponential distribution (see Lemma 2.7.7 in [30]).
We start by characterizing the real and imaginary parts of from (102),
[TABLE]
where , . Let us now define the random vector
[TABLE]
Since , , , are mutually independent Gaussian random variables with mean zero and variance (from (100)), by (143) and (144) we have that , , and for . Additionally,
[TABLE]
Note that the covariance matrix of is the identity matrix, i.e.
[TABLE]
where is the identity matrix.
Since satisfies the requirements of Definition 2 and (146), we apply Lemma B.7 and get
[TABLE]
or, equivalently,
[TABLE]
holds with probability at least
[TABLE]
Next, we adapt this result to the complex valued vector of (102).
From (145) we have
[TABLE]
Note that, if
[TABLE]
then,
[TABLE]
Since
[TABLE]
it follows from (148) ,(149) and (151) that, with probability at least
[TABLE]
it holds that
[TABLE]
Thus, from (153) it follows that
[TABLE]
with probability at least
[TABLE]
Plugging (154), (155) and (141) in (139) gives that
[TABLE]
with probability at least
[TABLE]
where we defined .
Now, recall from (138) that , and thus
[TABLE]
for , which concludes the proof. ∎
B.5 Non-Asymptotic error bound for
In this section, we would like to get a better understanding of the error
[TABLE]
in the regime of large and large . From Lemma 2.1 we have that
[TABLE]
and from Theorem 2.2 we have
[TABLE]
with high probability.
Note that , and are all averages of samples of random variables with zero expected value (see (107), (108), (111), and (138)). Thus, from the central limit theorem we have that
[TABLE]
for large for all and . Since we are interested in the regime of constant and large , we can extend (161) to bound all the entries of , and simultaneously (by the maximum of random variables in each matrix), and get that
[TABLE]
The result in (162) together with (160) means that all “parts” of the bound (159) of (namely, and ) go to zero at the same rate when , and thus, in the case of large and large , the dominant part of the bound is (since dominates , and for large ). This, together with (160) means that in the regime of large and large ,
[TABLE]
Appendix C Proof of Theorem 2.3
In Appendix A, we have shown that (see (64))
[TABLE]
where is a bias term given by
[TABLE]
By (20) and (21), can be expressed as
[TABLE]
where is the estimated bias term given by
[TABLE]
Since of (11) is a consistent estimator for , we have that
[TABLE]
and thus,
[TABLE]
for every . Correspondingly, from the Davis-Kahan theorem [15], the eigenspace of the leading eigenvalue of converges almost surely to . Thus, because of the updates made in (23) and (24), we have that
[TABLE]
where is unknown with . Recall that , and
[TABLE]
Therefore, if for every , then for every , and we have
[TABLE]
where . Lastly, by the formulas for the frequency marching estimator (31) and (32), using (165) with , we have
[TABLE]
where the last equality is due to (10). Therefore
[TABLE]
where
[TABLE]
Note that due to the update made in (23) we have that and from (165) we have that for all . Thus and thus is indeed an integer.
Appendix D Proof of Theorem 2.4
D.1 Supporting lemmas
We begin with Lemma D.1, which bounds the error of the “frequency marching” in Step 15 of Algorithm 1. For future reference we denote the “frequency marching” procedure as Algorithm 3.
Lemma D.1**.**
Let s.t. for , and let s.t. . Let be an estimate of , with . Then, applying Algorithm 3 on returns s.t. for some with and some constant independent of .
Proof.
Since are vectors with entries of norm , we switch to working with angles. Denote by the vectors , and .
By Lemma B.9, we have
[TABLE]
From the definition of and , we have that . In other words, we can write where
[TABLE]
Note that we can also write , where is the linear operator of a cyclic shift by 1. The eigen-decomposition of is
[TABLE]
where is the discrete Fourier transform matrix. Thus,
[TABLE]
The smallest non-zero singular value of is , thus, , where is the Moore–Penrose pseudo-inverse (satisfying that is a solution of ), and is some constant independent of . Note that the null space of () is one-dimensional as has a single zero singular value. It is also evident from (170) that for any constant vector we have that , and thus the null space of consists solely of constant vectors. Additionally, also from (170), if then . Since the constraint defines a linear subspace of dimension , and since , any vector with is in the image of .
The “frequency marching” procedure described in Algorithm 3 (and in Step 15 of Algorithm 1), when written in the form of (29), solves
[TABLE]
for , for all . Rewriting (171) in matrix form, Algorithm 3 solves . Since we have that , and therefore, there exists a constant vector such that , where (since constant vectors are the null space of and is a solution to ). Additionally, from the definition of () there is a constant vector such that . Now, we have,
[TABLE]
where the last inequality is due to (169). Since , denoting , we have from (172) that
[TABLE]
∎
Next, in Lemma D.2, we extend the result of Lemma D.1 by removing the requirement that for .
Lemma D.2**.**
Let , and let be the vector of phases of . Let such that . Assume that . Denote , and assume that . Let with and be an estimate of . Then, applying Algorithm 3 on returns s.t. for some with and some constant independent of .
Proof.
Denote such that and . Assume that (we will later consider the alternative). Then, by Lemma B.10 we have,
[TABLE]
Since , Lemma D.1 guarantees that applying Algorithm 3 on will result in such that
[TABLE]
for some with . Thus, We showed that applying the frequency marching procedure on results in as required. Since the frequency marching procedure dose not depend on the magnitude of the input vector, applying it on results in the same vector as applying it on .
Until now, we have shown that in the case when . Assume now that . Since for all ( is a vector of phases, and since is the output of Algorithm 3 it is also a vector of phases), we have that , and thus
[TABLE]
where the third inequality holds since and the last inequality is due to the assumption . Finally, denoting , we have from (173) and (174) that .
∎
The following Lemma shows that if the magnitudes of a signal are estimated accurately, and the phases are estimated accurately, than the signal is estimated accurately.
Lemma D.3**.**
Let with . Denote by the magnitudes of , and by the phases of (i.e. = ). Suppose that and are approximations of and such that and . Then, for , it holds that .
Proof.
[TABLE]
Since and for , we get from (121) that
[TABLE]
∎
D.2 Proof of Theorem 2.4
Proof.
The outline of the proof is as follows. We show that the frequency marching procedure results in a “good” estimate of the phases of of (6). Then we show that we have a “good” estimate of the magnitudes of as well. Finally, we use Lemma D.3 to combine the two and conclude the proof.
Denote
[TABLE]
Denote , and denote by the vector of phases of ,that is,
[TABLE]
where is the phases part of (). Denote where is the vector of phases of of (6) (). Note that
[TABLE]
Since and from (23), , we have that and satisfy the requirements of Lemma D.2 (as and correspondingly). Therefore, it follows that applying Algorithm 3 on will result in such that
[TABLE]
for some with .
Next, we bound the error in estimating , where . From (12) follows that, for large ,
[TABLE]
for some constant . Since for any we have
[TABLE]
we have for any positive and
[TABLE]
[TABLE]
Note that from the definition of and from (14) we have that , which together with (10) gives . Thus we have from (181),
[TABLE]
From (13) we have that, for large ,
[TABLE]
for some constant . From Taylor expansion of around [math] we have that for it holds that
[TABLE]
or,
[TABLE]
which, together with (183), assuming , gives
[TABLE]
where . Similarly to the derivation of (182), using (180), we have
[TABLE]
Thus, combining (182) with (185), we have
[TABLE]
or,
[TABLE]
Recall From (10) that . Thus we have
[TABLE]
assuming .
In case , we note that from (11) we have that and since we have,
[TABLE]
From (187) and (188) we have that there is a constant such that,
[TABLE]
for a large enough . Thus, by Step 12 of Algorithm 1 we have
[TABLE]
By (177), (190) and Lemma D.3, we have
[TABLE]
Since , and that the inverse Fourier transform is an orthogonal transformation, we have from Step 18 on Algorithm 1,
[TABLE]
Denoting , and noting that applying on a signal, does not not change its norm, we have that
[TABLE]
From (191), (192), and the fact that , we have
[TABLE]
Since minimizes the expression in (176), by Lemma 2.1 we have that
[TABLE]
and thus, from (D.2) we have,
[TABLE]
where
[TABLE]
where is from (109) and (138).
By Theorem 2.2 we have that
[TABLE]
with probability at least
[TABLE]
Combining (194) and (195) we have that there is some constant , such that
[TABLE]
with probability at least
[TABLE]
∎
Appendix E Proof of Theorem 2.5
In order to show that converges to (up to the inherent ambiguities), we will show that (from Step 6 of Algorithm 2) converges to and (from Step 19 of Algorithm 2) converges to where , for , and for some and such that . Since is computed in the same way in Algorithm 2 and Algorithm 1, we already proved in the analysis of Algorithm 1 that (see (190))
[TABLE]
We now show that . Define the “clean version” of (from (39)), as
[TABLE]
In (40) we showed that
[TABLE]
As was shown in Appendix C (equation (163))
[TABLE]
Thus, from (199), (200), and (39), it is easy to see that
[TABLE]
Recall that in Step 16 of Algorithm 2, we solve
[TABLE]
where
[TABLE]
Note that,
[TABLE]
where
[TABLE]
Note also that is uniformly distributed on the unit circle for . Denote
[TABLE]
Since the solution of (202) is the eigenvector corresponding to the leading eigenvalue of , from (204) by the Davis-Kahan theorem [15], we have that converges to an eigenvector of .
Next, since the discrete Fourier transform diagonalizes circulant matrices [16], and by simple algebra, it can be shown that
[TABLE]
where is the discrete Fourier transform matrix defined in (5). Then, it is evident that
[TABLE]
is a unitary matrix containing the eigenvectors of , and are the eigenvalues of . Since is a unitary matrix, and are strictly continuous i.i.d. on the unit circle, the eigenvalues of are distinct with probability . Then, since from (202) converges to one of the eigenvectors of , it converges to one of the columns of up to a constant factor of , where .
Denote by the ’s column of V. Note the special structure of ,
[TABLE]
Thus, converges to almost surely, up to a constant factor and an unknown modulation, or, explicitly,
[TABLE]
for some , , and some . Since is a consistent estimator for the magnitudes of (see (197)), and is a consistent estimator for the phases (see (208)), by Lemma D.3, the combination of them in Step 20 provides a consistent estimate for . Thus, computing the inverse Fourier transform in Step 21 of Algorithm 2 provides a consistent estimate for up to an unknown factor and a cyclic shift.
6 Acknowledgments
We would like to thank Prof. Boaz Nadler and Dr. Tamir Bendory for their remarks and comments. This research was partially supported by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH), by Award Number R01GM090200 from the NIGMS, and by a Fellowship from Jyväskylä University and the Clore Foundation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Emmanuel Abbe, Tamir Bendory, William Leeb, João M. Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. ar Xiv preprint ar Xiv:1710.02793 , 2017.
- 2[2] Emmanuel Abbe, João M Pereira, and Amit Singer. Sample complexity of the boolean multireference alignment problem. In Proceedings. IEEE International Symposium on Information Theory , volume 2017, page 1316. NIH Public Access, 2017.
- 3[3] Emmanuel Abbe, João M. Pereira, and Amit Singer. Estimation in the group action channel. ar Xiv preprint ar Xiv:1801.04366 , 2018.
- 4[4] Milton Abramowitz and Irene A. Stegun. Handbook of mathematical functions: With formulas, graphs, and mathematical tables applied mathematics series. National Bureau of Standards, Washington, DC , 1964.
- 5[5] Radosław Adamczak, Alexander E. Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus Mathematique , 349(3):195 – 200, 2011.
- 6[6] Radosław Adamczak, Alexander E. Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society , 23(2):535–561, 2010.
- 7[7] Radosław Adamczak, Alexander E. Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus Mathematique , 349(3-4):195–200, 2011.
- 8[8] Afonso S. Bandeira, Moses Charikar, Amit Singer, and Andy Zhu. Multireference alignment using semidefinite programming. In Proceedings of the 5th conference on Innovations in theoretical computer science , pages 459–470. ACM, 2014.
