Method of moments for 3-D single particle ab initio modeling with non-uniform distribution of viewing angles
Nir Sharon, Joe Kileel, Yuehaw Khoo, Boris Landa, Amit Singer

TL;DR
This paper extends the method of moments for cryo-EM ab initio modeling to handle unknown, non-uniform viewing angle distributions, simplifying the statistical requirements and proposing algorithms for structure reconstruction.
Contribution
It removes the uniformity assumption in Kam's method, demonstrating that first and second moments suffice for non-uniform distributions and developing algorithms for both known and unknown distributions.
Findings
First and second moments are sufficient for non-uniform distributions.
Efficient algorithms are proposed for known distribution cases.
Non-convex optimization methods can recover structure and distribution in unknown cases.
Abstract
Single-particle reconstruction in cryo-electron microscopy (cryo-EM) is an increasingly popular technique for determining the 3-D structure of a molecule from several noisy 2-D projections images taken at unknown viewing angles. Most reconstruction algorithms require a low-resolution initialization for the 3-D structure, which is the goal of ab initio modeling. Suggested by Zvi Kam in 1980, the method of moments (MoM) offers one approach, wherein low-order statistics of the 2-D images are computed and a 3-D structure is estimated by solving a system of polynomial equations. Unfortunately, Kam's method suffers from restrictive assumptions, most notably that viewing angles should be distributed uniformly. Often unrealistic, uniformity entails the computation of higher-order correlations, as in this case first and second moments fail to determine the 3-D structure. In the present paper, we…
| ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | |
| ✗ | ✗ | ? | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | |
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ? | |
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | |
| ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | |
| ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ? | ? | ? | |
| ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ? | ? | |
| ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ? | ? |
| polynomial map | arbitrary choices | linearization | rank check | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| vanilla | ||||||||||||
| projected |
|
|
||||||||||
| specialized |
|
|
|
|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Method of moments for 3-D single particle ab initio modeling with non-uniform distribution of viewing angles
Abstract.
Single-particle reconstruction in cryo-electron microscopy (cryo-EM) is an increasingly popular technique for determining the 3-D structure of a molecule from several noisy 2-D projections images taken at unknown viewing angles. Most reconstruction algorithms require a low-resolution initialization for the 3-D structure, which is the goal of ab initio modeling. Suggested by Zvi Kam in 1980, the method of moments (MoM) offers one approach, wherein low-order statistics of the 2-D images are computed and a 3-D structure is estimated by solving a system of polynomial equations. Unfortunately, Kam’s method suffers from restrictive assumptions, most notably that viewing angles should be distributed uniformly. Often unrealistic, uniformity entails the computation of higher-order correlations, as in this case first and second moments fail to determine the 3-D structure. In the present paper, we remove this hypothesis, by permitting an unknown, non-uniform distribution of viewing angles in MoM. Perhaps surprisingly, we show that this case is statistically easier than the uniform case, as now first and second moments generically suffice to determine low-resolution expansions of the molecule. In the idealized setting of a known, non-uniform distribution, we find an efficient provable algorithm inverting first and second moments. For unknown, non-uniform distributions, we use non-convex optimization methods to solve for both the molecule and distribution.
Key words and phrases:
cryo-EM, ab initio modeling, autocorrelation analysis, method of moments, spherical harmonics, Wigner matrices, polynomial equations, non-convex optimization.
1991 Mathematics Subject Classification:
Primary: 78M05, 90C26; Secondary: 14Q99.
∗Corresponding author: Nir Sharon.
*†*The first two authors contributed equally.
Nir Sharon*∗,†*
School of Mathematical Sciences, Tel Aviv University
Joe Kileel*†*
Program in Applied and Computational Mathematics, Princeton University
Yuehaw Khoo
Department of Statistics and the College, The University of Chicago
Boris Landa
Applied Mathematics Program, Yale University
Amit Singer
Program in Applied and Computational Mathematics and Department of Mathematics, Princeton University
1. Introduction
Single-particle cryo-electron microscopy (cryo-EM) is an imaging method for determining the high-resolution 3-D structure of biological macromolecules without crystallization [25, 35]. The reconstruction process in cryo-EM determines the 3-D structure of a molecule from its noisy 2-D tomographic projection images. By virtue of the experimental setup, each projection image is taken at an unknown viewing direction and has a very high level of noise, due to the small electron dose one can apply to the specimen before inflicting severe radiation damage, e.g., [12, 24, 41]. The computational pipeline that leads from the raw data, given many large unsegmented micrographs of projections, to the 3-D model consists of the following stages. The first step is particle picking, in which 2-D projection images are selected from micrographs. The selected particle images typically undergo 2-D classification to assess data quality and further improve particle picking. At this point, the 3-D reconstruction process begins, where often it is divided into two substeps of low-resolution modeling and 3-D refinement. In this paper, we focus on the mathematical aspects of the former, namely the modeling part. In particular, we suggest using the method of moments (MoM) for ab initio modeling. We illustrate this workflow with an overview given in Figure 1.
The last step in the reconstruction, also known as the refinement step, aims to improve the resolution as much as possible. This refinement process is typically a variant of the expectation-maximization (EM) algorithm which seeks the maximum likelihood estimator (MLE) via an efficient implementation, e.g., [52]. As such, 3-D refinement requires an initial structure that is close to the correct target structure [28, 51]. Serving this purpose, an ab initio model is the result of a reconstruction process which depends solely on the data at hand with no a priori assumptions about the 3-D structure of the molecule [49]. We remark that the two primary challenges for cryo-EM reconstruction are the high level of noise and the unknown viewing directions. Mathematically, without the presence of noise, the unknown viewing directions could be recovered using common lines [61, 62]. Then, the 3-D structure follows, for example, by tomographic inversion, see, e.g., [2]. Reliable detection of common lines is limited however to high signal-to-noise (SNR) ratio. As a result, the application of common lines based approaches is often limited to 2-D class averages rather than the original raw images [56]. Other alternatives such as frequency marching [7] and optimization using stochastic gradient have been suggested [48]. As optimization processes are designed to minimize highly non-convex cost functions, methods like SGD are not guaranteed to succeed. In addition, as in the case of EM, it is not a priori clear how many images are required.
Approximately forty years ago, Zvi Kam proposed a method for 3-D ab initio reconstruction based on computing the mean and covariance of the 2-D noisy images [33]. In order to uniquely determine the volume, the third moment (triple correlation) is also used besides the mean and covariance. In this approach, known as Kam’s method, the 3-D volume is reconstructed without estimating the viewing directions. In this sense, Kam’s method is strikingly different from common lines based approaches and maximum likelihood and other optimization methods that rely on orientation estimation for each image. Crucially, Kam’s method is effective at arbitrary levels of noise, given sufficiently many picked particles for accurate estimation of the moment statistics. Additionally, Kam’s method does not require any starting model, and it requires only one pass through the data to compute moments (contrary to other approaches needing access to the measurements multiple times). Despite the aforementioned advantages, Kam’s method relies on the restrictive assumption that the viewing directions for the images are distributed uniformly over the sphere. This hypothesis, alongside other technical issues, has so far prevented a direct application of Kam’s method to experimental cryo-EM data, for which viewing angles are typically non-uniform [4, 26, 44, 59]. This situation motivates us to explore generalizations of Kam’s method better suited to cryo-EM data.111We remark that Kam’s method, assuming uniform rotations, is of significant current interest in X-ray free electron laser (XFEL) single molecule imaging, where the assumption of uniformity more closely matches experimental reality [21, 45, 65].
In this paper, we generalize Kam’s theory to the case of non-uniform distribution of viewing directions. We regard Kam’s original approach with uniform distribution of viewing angles as a degenerate instance of MoM. In our formulation, we estimate both the 3-D structure and the unknown distribution of viewing angles jointly from the first two moments of the Fourier transformed images. More precisely, for images , the first and second empirical moments of the Fourier transformed images, given in polar coordinates, , are
[TABLE]
which upon the above discretization become 2-D and 4-D tensors, respectively. Our basic rationale for trying to obtain the volume from the first two moments is as follows. Supposing the distribution of rotations of the image plane to be uniform, then in the limit the first moment is radially symmetric, that is, it is only a function of but is independent of . Therefore, may be regarded as a 1-D vector. Similarly, the second moment is a 3-D tensor (rather than 4-D) since it will only depend on and through as . Also is linearly related to the molecule’s volume via a tomographic projection. Thus, for images of size pixels, the first and second moments should give rise to polynomial equations for the unknown volume and distribution. Assuming the volume is of size (and the distribution is of lower dimensionality), then the first and second moments have “just” the right number of equations (in terms of leading order) to determine the unknowns. Unfortunately, when the distribution of viewing directions is uniform, as noted by Kam [33], the information encoded in the second moment is algebraically redundant; essentially it is the autocorrelation function (or equivalently, the power spectrum), and this information is insufficient for determining the structure of the molecule. As we will see, a non-uniform distribution of viewing directions introduces additional terms in both the first and second moments, and extends the number of independent equations beyond the autocorrelation case. In particular, we will show that non-uniformity guarantees uniqueness from the analytical counterparts of and in cases of a known distribution, and it guarantees finitely many solutions in other, more realistic, cases of an unknown distribution.
Our work is inspired by several earlier studies on simplified models in a setting called Multi-Reference Alignment (MRA). In MRA, a given group of transformations acts on a vector space of signals [5]. For example, the group acts on the space of band-limited signals over the unit circle by rotating them counterclockwise (as a 1-D analog of cryo-EM). The task then is to estimate a ground truth signal from multiple noisy samples, corresponding to unknown group elements of a finite cyclic subgroup of acting on the signal. The papers [6, 9] show that for a uniform distribution over the group, the signal can be estimated from the third moment, and the number of samples required scales like the third power of the noise variance. On the other hand, for a non-uniform and also aperiodic distributions over the group, the signal can be estimated from the second moment, and the required number of samples scales quadratically with the noise variance [1].
Despite the success of signal recovery in MRA from the first two moments under the action of the cyclic group, it is not apparent that such a strategy is still applicable in the case of cryo-EM. First, in cryo-EM, each image is obtained from the ground truth volume not just by applying a rotation in , but also a tomographic projection. Moreover, the studies mentioned above (of MRA) consider finite abelian groups, whereas, in the case of cryo-EM, the group under consideration is the continuous non-commutative group . The goal of this paper is then to investigate whether the first and second moment of the images is also sufficient for solving the inverse problem of structure determination in the cryo-EM setting.
1.1. Our contribution
We formulate the reconstruction problem in cryo-EM as an inverse problem of determining the volume and the distribution of viewing directions from the first two moments of the images. Assuming the volume and distribution are band-limited functions, they are discretized by Prolate Spheroidal Wave Functions (PSWFs) and Wigner matrices, respectively. The moments give rise to a polynomial system in which the unknowns are the coefficients of the volume and the distribution. Using computational algebraic geometry techniques [20, 23, 58], we exhibit a range of band limits for the volume and the distribution such that the polynomial system has only finitely many solutions, pointing to the possibility of exact recovery in these regimes. Additionally, we comment on numerical stability issues, by providing condition number formulas for moment inversion. In the setting where the rotational distribution is known, we prove that the number of solutions is generically 1 and present an efficient algorithm for recovering the volume using ideas from tensor decomposition [31]. For the practical case of an unknown distribution, we rely on methods from non-convex optimization and demonstrate, with synthetic data, successful ab initio model recovery of a molecule from the first two moments.
1.2. Organization
The paper is organized as follows. In Section 2, we present discretizations for the volume and distribution and derive the polynomial system obtained from the first two moments. In Section 3, we demonstrate that there exists a range of band limits where the polynomial system for the unknown molecule and distribution has only finitely many solutions. In Section 4, we discuss some implementation details on how the system is solved and present numerical and visual results. Proofs and background material are provided in appendices. For research reproducibility, MATLAB code is publicly available at GitHub.com.222The full address of the GitHub repository is https://github.com/nirsharon/nonuniformMoM.
2. Method of moments
We begin by introducing the image formation model. Then, convenient basis for discretizing various continuous objects, namely the images and the volume (in the Fourier domain) as well as the distribution for orientations, are introduced. From these, relationships between the moments of the 2-D images and the 3-D molecular volume can be derived, enabling us to fit the molecular structure to the empirical moments of the images.
2.1. Image formation model and the 3-D reconstruction problem
In cryo-EM, data is acquired by projecting particles embedded in ice along the direction of the beaming electrons, resulting in tomographic images of the particles. The particles orient themselves randomly with respect to the projection direction. More formally, let be the Coulomb potential of the 3-D volume, and the projection operator be denoted by , where
[TABLE]
Assuming the -th particle comes from the same volume but rotated by , the image formation model is [10, 25]
[TABLE]
where is a random field modeling the noise term and is a point spread function, whose Fourier transform is known as the contrast transfer function (CTF) [42, 50, 60]. Each image is assumed to lie within the box . For size discretized images, we assume the random field . Here denotes an element in the group of rotations , and we define the group action by333Here we prefer to write the action of and correspondingly later we use Wigner -matrices, instead of and Wigner -matrices. While simply notational, these conventions allow us to cite identities from [19] verbatim, which are in terms of Wigner -matrices and not Wigner -matrices.
[TABLE]
The rotations ’s are not known since the molecules can take any orientation with respect to projection direction. For the purpose of simplifying the exposition, we shall henceforth disregard the CTF, by assuming
[TABLE]
The presence of CTF is not expected to have a major impact on our main results, and we will incorporate the CTF in a future work. Typically, it is convenient to consider Fourier transform of the images, since by projection slice theorem, the Fourier transform of gives a slice of the Fourier coefficients of the volume :
[TABLE]
The goal of cryo-EM is to recover from the Fourier coefficients of the projections . While reconstructing given estimated ’s amounts to solving a standard computed tomography problem, we wish to reconstruct directly from the noisy images without estimating the rotations, for reasons detailed above. To this end, we assume the rotations are sampled from a distribution on , where is a smooth band-limited function. Then from the empirical moments of the images , we jointly estimate the volume and the distribution .
2.2. Representation of the volume, the distribution of rotations and the images
As mentioned previously, the proposed method of moments consists of fitting the analytical moments
[TABLE]
to their empirical counterparts and as appears in (1) after debiasing.444By the law of large numbers, and almost surely as , so is fitted to and to . For notational convenience, we drop in what follows, either assuming has been appropriately debiased already or . Through fitting to the empirical moments, we seek to determine the Fourier volume and also the distribution . In this section, we present discretizations of and by expanding them using convenient bases.
2.2.1. Basis for the Fourier volume
Since the image formation model involves rotations of the Fourier volume , it is convenient to represent as an element of a function space closed under rotations; in fact, this is the same as representing using spherical harmonics (see the Peter-Weyl theorem [19]):
[TABLE]
Here are the (complex) spherical harmonics:
[TABLE]
with associated Legendre polynomials defined by:
[TABLE]
In Cartesian coordinates, spherical harmonics are polynomials of degree . Without loss of generality, the radial frequency functions should form an orthonormal family (for each fixed ) with respect to , where is referred to as the radial index. Choices of radial functions suitable for molecular densities include spherical Bessel functions [3], which are eigenfunctions of the Laplacian on a closed ball with Dirichlet boundary condition, as well as the radial components of 3-D prolate spheroidal wave functions [57].
We assume the volume is band-limited with Fourier coefficients supported within a radius of size , i.e., the Nyquist cutoff frequency for the images ’s discretized on a grid of size (over the square . Under this assumption, the maximum degree and radial indices and in (8) are essentially finite. Further details on the particular basis functions and cutoffs and that we choose to use are deferred to Section A in the appendix. Note that in practice, as we target low-resolution modeling, one can choose to decrease either the cutoff or the grid size to obtain more compact settings. The coefficients furnish our representation of using spherical harmonics. Note that since is real valued, its Fourier transform is conjugate-symmetric, which imposes restrictions on the coefficients . The specific constraints are presented in Section 4.1.
The advantage of expanding in terms of spherical harmonics is that the space of degree spherical harmonics is closed under rotation; in group-theoretic language, this space forms a linear representation of .555In fact, this is an irreducible representation of and varying these give all irreps. Thus the action of a rotation on amounts to a linear transformation on the expansion coefficients (with a block structure according to and ). More precisely, fixing the vector space spanned by for a specific , the action of a rotation on this vector space is represented by the Wigner matrix (see [19, p. 343]) so that:
[TABLE]
In particular, the matrix is unitary, with entries degree polynomials in the entries of [19]. For all and , the group homomorphism property reads . In light of (11), 3-D bases of the form have been called steerable bases.
2.2.2. Basis for the probability distribution of rotations
As we shall see, when expanding the volume in terms of spherical harmonics, the analytical moments (7) involve integrating different monomials of with respect to the measure . To this end, we assume the probability density over is a smooth band-limited function (and in a function space closed under rotation) by expanding
[TABLE]
By Peter-Weyl, these form an orthonormal basis for , and for higher they are increasingly oscillatory functions on . Thus, expansion (12) is analogous to using spherical harmonics to expand a smooth function on the sphere, or using Fourier modes for a function on the circle. The cutoff is the band limit of the distribution ; we shall see in the next section that since we use only first and second moments it makes sense to assume . Note that in the special case of a uniform distribution, the only nonzero coefficient is . Also, denotes the Haar measure, which is the unique volume form on the group of total mass one that is invariant under left action. Using the Euler angles parameterization of , the Haar measure is of the form
[TABLE]
where the normalizing constant ensures .
2.2.3. Basis for the 2-D images
At this point, we discuss convenient representations for the images after Fourier transform, . Similarly to volumes, it is desirable to represent images using a function space closed under in-plane rotations, i.e., . By the Peter-Weyl theorem, this is the same as expanding using Fourier modes, in a 2-D steerable basis:
[TABLE]
Here the radial frequency functions (for fixed ) are taken to be an orthonormal basis with respect to , with referred to as the radial frequency. Comparing to expansion (8) (see Section 2.2), it makes most sense to set . Again, owing to the Nyquist frequency for the discretized images , we may bound the cutoffs . Typical choices for for representing tomographic images include Fourier-Bessel functions [66] and the radial components of 2-D prolate spheroidal wave functions [57]. Details on our specific choices are given in Section A.2 in the appendix.
2.2.4. Choice of radial functions
For the finite expansions in (8) and (14) to accurately represent the Fourier transforms of the electric potential and its slices, one should carefully choose the radial functions and , together with the truncation-related quantities , , , and . In this work, we consider and to be the radial parts of the three-dimensional and two-dimensional PSWFs [57], respectively. In Appendix A, we describe some of the key properties of the PSWFs, and propose upper bounds for setting , , , and . In practice, band limits would be selected by balancing these expressivity considerations together with the well-posedness and conditioning considerations of Section 3.
2.3. Low-order moments
In this section, we derive the analytical relationship between the first two moments for the observed images , and the coefficients and of the volume and distribution of rotations. These relationships will be used to determine and via solving a nonlinear least-squares problem.
To this end, we first register a crucial relationship between the coefficients of the 2-D images and the 3-D volume. By indexing the images in terms of (instead of in (14)), we have:
[TABLE]
On the other hand, using the Fourier slice theorem and (11):
[TABLE]
Multiplying (15) and (16) by and integrating against , then combining the orthogonality relation
[TABLE]
with , tells us
[TABLE]
where are constants depending on the radial functions:
[TABLE]
From the term , we see if (mod ) (and if then ). Also one may check . Equation (19) connects 2-D image coefficients with 3-D volume coefficients. We note we may as well choose in (15), since if then . In practice, the coefficients are calculated via numerical integration over a closed segment, according to the localization property of the PSWFs, see Appendix A and [39].
2.3.1. The first moment
In this section, from (19) the relationship between the first moment of the images and the volume is derived. Taking the expectation over , and using the distribution expansion (12), we get
[TABLE]
The last equation follows from the orthogonality of the Wigner matrix entries [11, p. 68]
[TABLE]
and
[TABLE]
The first moment gives a set of bilinear forms in the unknowns and , as seen in (24) for each with and .
It is convenient to provide compact notation for the first moment formula. To this end, we introduce:
- (1)
, a matrix of size given by 2. (2)
, a vector of size given by 3. (3)
, a matrix of size given by .
Item 2 is zero if and item 3 is zero if either or . In this notation, the first moment formula (24) (with fixed and varying ) reads:
[TABLE]
Here is nonzero only if .
2.3.2. The second moment
Higher moments require higher powers of the image coefficients, and so in the case of the second moment and for , we have
[TABLE]
where
[TABLE]
The product of two Wigner matrix entries is expressed as a linear combination of Wigner matrix entries [19, p. 351],
[TABLE]
where
[TABLE]
is the product of two Clebsch-Gordan coefficients. This product is nonzero only if satisfy the triangle inequalities. Substituting (31) into (30), and invoking (25) and (26), we obtain:
[TABLE]
where the sum is over satisfying . Now substituting into (28) gives:
[TABLE]
where the first sum has the range of (28) and the second sum has range of (33). The second moment thus gives a set of polynomials in unknowns and , quadratic in the volume coefficients and linear in the distribution coefficients, namely, the expression in (34) for each with , , , and . Also, it may be assumed that , since with does not contribute in either (34) or (24).
As for the first moment, it will be convenient to rewrite the second moment in compact notation. Let us further introduce:
- (4)
, a matrix of size given by
[TABLE]
where the sum is over and denotes the product Clebsch-Gordan coefficients in (32).
Item 4 is zero if either or or . Now for fixed and varying , the second moment (34) neatly reads:
[TABLE]
Here is nonzero only if and .
3. Uniqueness Guarantees and Conditioning
Here, we derive uniqueness guarantees and comment on intrinsic conditioning for the polynomial system defined by the first and second moments, (27) and (35).
Analysis comes in four cases, according to assumptions on the distribution : whether is known or unknown; and if is invariant to in-plane rotations, i.e., depends only on the viewing directions up to rotations that retain the -axis. This invariance restricts to be a non-uniform distribution function over , see subsection 4.2. If is not invariant to in-plane rotations, we say is totally non-uniform as a distribution on the entire . Throughout, our general finding is well-posedness, i.e., the molecule is uniquely determined by first and second moments up to finitely many solutions, under genericity assumptions, for a range of band limits and . In the case of a known totally non-uniform distribution, we prove the number of solutions is , and give an efficient, explicit algorithm to solve for . For all cases, sensitivity of the solution to errors in the moments is quantified by condition number formulas.
3.1. Known, totally non-uniform
For this case, we have a provable algorithm that recovers from (27) and (35) (up to the satisfaction of technical genericity and band limit conditions). Remarkably, while the polynomial system is nonlinear (consisting of both quadratic and linear equations), our method is based only on linear algebra. The main technical idea is simultaneous diagonalization borrowed from Jennrich’s well-known algorithm for third-order tensor decomposition [31], that was also used recently for signal recovery in MRA [46].
Theorem 1**.**
The molecule is uniquely determined by the analytical first and second moments, (27) and (35), in the case the distribution is totally non-uniform, known and , provided it also holds:
- (i)
The matrices and of size both have full rank, and has distinct eigenvalues. Likewise and of size both have full rank, and has distinct eigenvalues. 2. (ii)
Writing and for eigendecompositions, the vectors of size and of size both have no zero entries. 3. (iii)
For , the matrix of size has full row rank. 4. (iv)
For all , the matrix of size has full column rank. 5. (v)
For with , the matrix of size has full column rank.
Moreover in this case, there is a provable algorithm inverting (27) and (35) to get in time , where .
Proof.
For this proof, we need some general properties of the Moore-Penrose pseudo-inverse, denoted by , as in [8]. In particular, if has full column rank and has full row rank, then , , , and also, pseudo-inversion and transposition commute.
Proceeding, the second moment with tells us:
[TABLE]
and with :
[TABLE]
where . We compute times the Moore-Penrose psuedo-inverse of (37) and then multiply this on the right of (36). Because and are each tall with full column rank by assumptions () and (), respectively, and is invertible by (), properties of the pseudo-inverse imply:
[TABLE]
where we have substituted in an eigendecomposition . As has distinct eigenvalues by condition (), we see that the eigenvectors of are unique up to scale and given as the columns of . Thus, , where consists of eigenvectors of (3.1) and is an unknown (as yet) diagonal matrix.
To disambiguate the scales , we compare with the first moment for :
[TABLE]
Multiplying on the left by gives , an equality of matrix-vector products in which the only unknown is the diagonal matrix . By the full support of (assumption ()), this determines . Substituting into , we now know . Multiplying on the left by and on the right by tells us .
Backward marching, the second moment with and reads:
[TABLE]
At this point, we know the first term, and thus the second term gives us by appropriately multiplying by pseudo-inverses ( is right-invertible by ()).
Then, we may look at the second moments with and to similarly determine , and so on, to or (depending on the parity of ). Analogous reasoning and usage of the assumptions gives
We have provided an algorithm to solve for each , which proves uniqueness of as a byproduct. The time complexity of the algorithm is since it involves matrix operations –matrix multiplications, pseudo-inversions or eigendecompositions – of matrices whose dimensions are all bounded by . (Note that back-substituting to solve for involves such matrix operations.) ∎
We remark that condition (iv), which just involves the choice of radial bases, appears to always hold for PSWFs using the cutoffs proposed in Appendix A. Conditions (i), () and (iii) just involve the distribution, and are full-rank, spectral and non-vanishing hypotheses. Condition (iv) just involves the molecule and in particular requires , which limits to be less than the Nyquist frequency where .
Our algorithm goes by reverse666Reverse frequency marching is natural given the sparsity structure of (35): only and with , and , appear in the moments . frequency marching, as we solve for top-frequency coefficients from the second moment (35) where via eigenvectors (similar to simultaneous diagonalization in Jennrich’s algorithm), and then solving for lower-frequency coefficients via linear systems. While our conditions in Theorem 1 are certainly not necessary, fortunately for generic777This means generic with respect to the Zariski topology [30]. Equivalently, there is a non-zero polynomial in such that implies the conditions in Theorem 1 are met. , those conditions are satisfied, so that the method applies:
Lemma 2**.**
Condition (ii) in Theorem 1 holds for Zariski-generic . If , then condition (iii) holds for Zariski-generic . At least for , conditions (i) and () hold for Zariski-generic .
Proof of Lemma 2.
Conditions ()-() are all Zariski-open, i.e., their failure implies or obey polynomial equations. As such, to conclude genericity, it suffices to exhibit a single point or , where the conditions are met. For conditions (), (), we verified the conditions hold at randomly selected points on computer up to . Conditions () and () are obviously generic. ∎
By uniqueness, is a well-defined function of the first and second moments and almost everywhere. It is useful to quantify the “sensitivity” of to errors in , as, e.g., in practice one can access only empirical estimates and . An a posteriori (absolute) condition number for is given by the reciprocal of the least singular value of the Jacobian matrix of the algebraic map:
[TABLE]
Throughout this section, all condition formulas are in the sense of [16, Section 14.3], for which the domain and image of our moment maps are viewed as Riemannian manifolds. To this end, when is unknown, dense open subsets of the orbit spaces , , naturally identify as Riemmannian manifolds (for the construction, see [15]).
3.2. Known, in-plane uniform
For this case, given a particular image size (and other image parameters), together with band limits and , we have code888Available in GitHub: https://github.com/nirsharon/nonuniformMoM/JacobianTest. which decides if, for generic and , the molecule is determined by (27) and (35), up to finitely many solutions. The basis for this code is the so-called Jacobian test for algebraic maps, see Appendix B. Below is an illustrative computation.
Computational Result 3**.**
Consider pixel images, and the following parameters for prolates (representative values): a bandlimit (see Appendix A) chosen as the Nyquist frequency, 2-D prescribed accuracy (94) set to and 3-D truncation parameter (74) to be 999The value of means we allow only of the energy to be outside the ball, and is chosen to best model a molecule structure which is assumed to be mostly supported inside a ball.. We varied band limits in (8) and in (12), and randomly fixed (12) to give a known in-plane uniform distribution. For each , we computed the numerical rank of the Jacobian matrix of the polynomial map of (41) at a randomly chosen , with random . The Jacobian was convincingly of full numerical rank for a variety of band limits, as seen in Table 1. Cases where the gap between the two least singular values of the Jacobian matrix exceeds a threshold of are set as indecisive numerics, and appears in the table as ?. Note that if the rank was calculated in exact arithmetic, this gives a proof that for generic generic fibers of the map consist of finitely many ; i.e., first and second moments (with known in-plane uniform distribution) determine the molecule up to finitely many solutions. For fibers and related definitions, see Appendix B.
Again, the sensitivity of as a locally defined function of (27) and (35) is quantified by the reciprocal of the least singular value of the Jacobian matrix of .
3.3. Unknown, totally non-uniform
In this case, it is important to note that solutions come in symmetry classes. If have specified moments, then so too for for all , that is, we may jointly rotate the molecule and probability distribution and the moments are left invariant. So, solutions come in -dimensional equivalence classes, and we are interested in solutions modulo .
That said, we have code which accepts a particular image size (and other image parameters), together with band limits and . The code then numerically decides which of the following situations occur: i) for generic , both and are determined by (27) and (35) up to finitely many solutions modulo ; ii) for generic , the molecule is determined by (27) and (35) up to finitely many solutions modulo , whereas the distribution has infinitely many solutions; iii) for generic , both and have infinitely many solutions modulo . Note these cases are (essentially) exhaustive, since if is determined so is in the regime of Theorem 1. Moreover, we noticed the case ii) really does arise, e.g., this seems to happen when .
Computational Result 4**.**
We keep the running example of pixel images, and the prolates parameters of a bandlimit chosen as the Nyquist frequency, 2-D prescribed accuracy (94) set to and 3-D truncation parameter (74) of . We varied band limits in (8) and in (12). For each , we computed the numerical rank of the Jacobian matrix of the polynomial map
[TABLE]
at a randomly chosen point in the domain. The numerical rank of the Jacobian convincingly equaled three less (that is , see Appendix B) than full column rank for a variety of band limits, see Table 2. Cases where the gap between the third and fourth least singular values of the Jacobian matrix exceeds a threshold of are set as indecisive numerics, and appears in the table as ?. If the rank were calculated in exact arithmetic, this furnishes a proof that generic fibers of the map consist of finitely many -orbits; that is, first and second moments determine both the molecule and the totally non-uniform distribution up to finitely many solutions (modulo global rotation).
For band limits and such that generically there are only finitely many solutions for mod , the sensitivity of as a (locally defined) function of (27) and (35) is quantified by the reciprocal of the fourth least singular of . For band limits such that generically there are only finitely many solutions for mod , the sensitivity of as a locally defined of (27) and (35) is quantified by the reciprocal of the fourth least singular value of
[TABLE]
where denotes pseudo-inverse and is the differential of . We compute (43) by analytically differentiating (27) and (35), evaluating at and place as diagonal blocks of a matrix, and finally applying pseudo-inverse which is SVD-based.
3.4. Unknown, in-plane uniform
Again in this case, solutions come in -symmetry classes, orbits under the action of global rotation, so we are interested in solutions modulo . We have code which accepts a particular image size (and other image parameters), together with band limits and , and numerically decides if for generic , both and are determined by (27) and (35) up to finitely many solutions modulo , or if there are infinitely many solutions. We did not find parameters giving a “mixed” result as in case ii) above.
Computational Result 5**.**
For pixel images, and the parameters for prolates (representative values): a bandlimit chosen as the Nyquist frequency, 2-D prescribed accuracy (94) set to and 3-D truncation parameter (74) of . We varied band limits in (8) and in (12), restricting (12) to an in-plane uniform distribution. For each , we computed the numerical rank of the Jacobian matrix of the polynomial map:
[TABLE]
at a randomly chosen point in the domain. The numerical rank of the Jacobian convincingly equaled three less than full column rank for a variety of band limits, see Table 3. Cases where the gap between the third and fourth least singular values of the Jacobian matrix exceeds a threshold of are set as indecisive numerics, and appears in the table as ?. If the rank was calculated in exact arithmetic, this furnishes a proof that generic fibers of the map consist of finitely many -orbits; that is, first and second moments determine both the molecule and the in-plane uniform distribution up to finitely many solutions (modulo global rotation).
For band limits and such that generically there are only finitely many solutions for mod , the sensitivity of as a function of moments is quantified by the reciprocal of the fourth least singular of . For example, in the row of Table 3, when evaluating at random , this worked out to:
[TABLE]
Further, in the column of Table 1, evaluating at random gave:
[TABLE]
In practice, we run this refined Jacobian test (takes minute on a standard laptop) to identify well-conditioned band limits and before we attempt non-convex optimization.
4. Numerical Optimization and First Visual Examples
After studying the theoretical properties of the polynomial system which is defined by the first two moments, we discuss in this section aspects of numerically inverting the polynomial map via optimization.
4.1. Incorporating natural constraints in optimization
When determining the coefficients and , the search space has to be restricted in order to ensure the coefficients stem from some physical volume and density.
4.1.1. Constraints on the volume
To ensure the volume is a real-valued function, one has to ensure its Fourier transformation satisfies conjugate symmetry . That is, in spherical coordinates,
[TABLE]
Assuming the basis is a set of real-valued functions, along with the facts that and , we get
[TABLE]
This further implies
[TABLE]
Having such relationships, can thus be written in terms of some real coefficients as:
[TABLE]
The latter means that instead of solving a complex optimization problem in terms of the coefficients , one can work with the real coefficients of (46). Otherwise, the equality constraints (45) are required.
4.1.2. Constraints on the density
Similarly, to ensure the density being a real-valued function, we need to ensure
[TABLE]
The fact that leads to
[TABLE]
Again, from such relationships, it can be shown that an alternative to (48) can be written in terms of real coefficients :
[TABLE]
Here, is the lexicographical order, that is iff or both and .
Two additional constraints are required. First, the integral of any density function is one. To ensure such a correct normalization, we simply let
[TABLE]
which means it is no longer considered as unknown. Finally, the nonnegativity of the density is ensured via a collocation method, that is requiring
[TABLE]
for ’s on a near uniform, refined grid on . While (51) does not prevent the density from becoming negative off the grid, requiring the density to be non-negative entirely on leads to an optimization problem that is much more costly to solve in practice. Note that we do not enforce positivity of by requiring it to be a sum-of-squares, as, e.g., already in the case of an in-plane uniform distribution on the sphere , not all nonnegative polynomials may be written as a sum-of-squares, see Motzkin’s example when [43].
4.2. Accommodating invariance to in-plane rotations
While molecules typically exhibit preferred orientations, there is no physical reason why molecules should have preferred in-plane orientations. In this section, we focus on the case of non-uniform rotational distributions invariant to in-plane rotations since these distributions better model real cryo-EM data sets.
For simplicity, we fix the image plane as perpendicular to the -axis. We add the prior that the density for drawing equals the density for drawing , for all and all rotations of radians about the -axis. This assumption reads
[TABLE]
Therefore,
[TABLE]
Here we used the group representation property of . Checking explicitly the action of on degree spherical harmonics,
[TABLE]
So continuing the above,
[TABLE]
This is equivalent to for where ranges over . To sum, we have found that in-plane invariance is captured by:
[TABLE]
For a sanity check, a distribution with in-plane invariance should sample a rotation with density only depending on which point maps to the north pole. Namely, should only depend on the last column of , that is, . Indeed, this holds as [19, Eqn. 9.44, Pg. 342].
Restricting the expansion of as above, we easily see the first moment is independent of . It is now merely a linear combination of basis functions . Likewise, for the second moment, angular dependency is only on the difference , meaning it is a linear combination of basis functions . Thus, in subsection (3.4), we have the following polynomial map, now with fewer variables and fewer invariants than in subsection (3.3)
[TABLE]
4.3. Direct method – known totally non-uniform distribution
For the “easy” case of a known, totally non-uniform distribution, we have implemented the provable algorithm in Theorem 1. The method’s performance is illustrated by way of an example. As the ground truth volume, we use EMD-0409, that is, the catalytic subunit of protein kinase A bound to ATP and IP20 [32], as presented at the online cryo-EM data-bank [38]. The volumetric array’s original dimension is voxels in each direction, which we downsampled by a factor of three to . The volume was expanded using PSWFs with a band limit chosen to be the Nyquist frequency and 3-D truncation parameter (74) of . Before downsampling, the full expansion consists of degree ; with downsampling and proper truncation, we aim to recover the terms up to degree . For the known totally non-uniform distribution, we took (per Theorem 1), and then formed a particular distribution using a sums-of-squares. Precisely, we formed a random linear combination of Wigner entries up to degree , multiplied this by its complex conjugate, invoked (26) and (32) to rewrite the result as a linear combination of Wigner entries up to degree , repeated for a second square, added, and finally normalized to satisfy (50). Then, with the distribution known as such, the volume contributes unknowns (without discounting for (45)). Providing the algorithm with and , our method took seconds on a standard laptop, and recovered the unknowns up to a relative error in norm of . Visual results are in Figure 2.
4.4. Setting up a least-squares formulation
For the cases where we lack a direct method, we formulate the problem in terms of minimizing a least-squares cost function. First, we define the unknowns of our optimization process to be the coefficients of the volume and distribution . The explicit formulas (27) and (35) provide means to write the low-order moments (7) as functions of our unknown coefficients, that is and .
In practice, given data images, one estimates the low-order statistics using the empirical moments and of (1), but now given in PSWFs coordinates
[TABLE]
The connection between the empirical moments and their analytical formulas as functions of our unknowns gives rise to a nonlinear least-squares
[TABLE]
where is a parameter chosen to balance the errors from both terms. In particular, two main considerations determine the value of . First is the number of elements in each summand. Namely, the second moment includes many more entries than the first moment. Therefore, without the effect of noise, is set to be the ratio between the number of entries in first moment and the second moment. The second factor to balance is the different convergence rates of the empirical moments, see also [1]. The nonlinear least-squares (60) may be adjusted to incorporate the constraints on and that ensure is a real-valued volume and a probability density.
We remark that it is interesting to consider pre-conditioners, or more intricate weighings, in the formation of the nonlinear least-squares cost (60). Such might alleviate high condition numbers observed in Section 3, and potentially accelerate optimization algorithms. While we have not tested a pre-conditioner in optimization experiments yet, one possibility would be to consider the following normalized cost:
[TABLE]
Effectively, (61) scales each polynomial in given by and to take value .
4.5. Complexity analysis of inverting the moments via gradient-based optimization
Before moving forward to further numerical examples, we state the computational load of minimizing the least-squares cost function (60). It is worth noting that in many modern ab initio algorithms, like SGD [48] and EM [52], the runtime of each iteration is measured with respect to the size of the set of data images, which can be huge. In our approach, we only carry out one pass over the data to collect the low-order statistics. In here, we assume the empirical moments are already given, and so the complexity of each iteration is merely a function of the size of the moments or equivalently depends on the size and resolution of the data images, as reflected by their PSWF representations.
Many possible algorithms exist to minimize the least squares problem (60), for example direct gradient descent methods, such as trust-region [47], or alternating approaches, including alternating stochastic gradient descent. Here, we present the complexity of evaluating the cost function and its gradient, regardless of the specific algorithm or implementation one wishes to exploit.
For simplicity, denote by and two bounds for the radial indices and of the 3-D and 2-D PSWF expansions, respectively. Typically, it is sufficient to take and , as radial degree decreases as overall degree () increases.
Starting from the first moment (27): with a fixed we have to apply two matrix-vector products in a row which requires an order of arithmetic operations. The variable increases up to , which sums up to a total of . The gradient uses the precomputed remainder and is calculated by two terms with similar complexity as the above. Namely, the cost of both evaluation and gradient calculations is again .
For the second moment, we follow (35): establishing is done in and applying the product in . Overall, the evaluation is bounded by
[TABLE]
The gradient is a bit more complicated, in short, there are two terms for the volume derivatives and one term for the distribution part, with the precomputed remainder we get an overall complexity of . In summary, the first moment requires third-order complexity with respect to the different parameters where the second moment requires a total power of five.
Finally, the parameters , , and can be described by the PSWF representation: the length of the 3-D PSWF expansion and the bound on the radial indices are related to the parameter of sampling rate, and are bounded according to (A.1.1). Additional bound, now on the radial 2-D expansion , uses the accuracy parameter of the 2-D images and the above as given in (94). For more details on those parameters, see Appendix A.
4.6. Remark on using semidefinite programming (SDP) relaxation
Solving the nonlinear least-squares problem in Eq. (60) could suffer from slow convergence because the cost function is a polynomial of degree 6. We remark that in principle, it is possible to apply a semidefinite programming relaxation to facilitate the optimization. For convenience, let the second moments be summarized as
[TABLE]
where is a linear operator that captures the RHS of Eq. (34). If we define
[TABLE]
the optimization problem can be written as
[TABLE]
To deal with the non-convex constraint , we propose the following relaxed constraint
[TABLE]
which gives the following non-linear least squares problem
[TABLE]
Comparing with (60), although (65) is still a non-convex problem, the degree of the polynomial in the cost function of (65) is 4 (instead of 6). Furthermore, one can solve (65) efficiently by minimizing and in an alternating fashion. Therefore if at the optimum in spite of the relaxation (64), solving (65) can be advantageous.
We remark on the special case when the density coefficient is given. In this situation, one can consider an SDP relaxation
[TABLE]
The nuclear norm minimization strategy as in matrix completion [17] is used to promote to be of rank-1. We test the SDP in (4.6) when given a fixed . We generate for a non-uniform distribution from a 6-th degree nonnegative polynomial over the rotation group, i.e. letting . We generate a volume with random coefficients with . Noise is added to the moments in the following manner:
[TABLE]
Where
[TABLE]
and
[TABLE]
In this case, we set in (4.6),
[TABLE]
The stability results in recovering are shown in Figure 3. We ran five simulations for every and average the relative error
[TABLE]
Results show an exact recovery in the noiseless case and slowly increasing in relative error as grows.
4.7. Volume from moments – non-uniform vs. uniform
As a first numerical example, we present a recovery comparison between the cases of uniform and non-uniform distributions of rotations. In this example, we use as a ground truth a low degree approximation of a mixture of six Gaussians, given in a non-symmetric conformation. The approximation, which we ultimately use as our reference, is attained by discretizing the initial volume to and truncating the PSWFs expansion to . This expansion consists of coefficients in total. The other PSWFs parameters that we use are a band limit that corresponds to the Nyquist frequency and 3-D truncation parameter (74) of . The original volume and its approximation appear in Figure 4.
We divide the example into two scenarios of different distributions, uniform and non-uniform. In each case, we start from the analytic moments (7), calculated with respect to 2-D prescribed accuracy (94) of , and obtain an estimation based on minimizing the least squares cost function (60). The optimization is carried with a gradient-based method, specifically we use an implementation of the trust-region algorithm, see e.g., [47]. In the first case, we use as the distribution of rotations a quadratic expansion which is in-plane uniform. Based on the in-plane invariance, we present this distribution as a function on the sphere in Figure 5. For the second case, we use a uniform distribution of rotations.
In both cases, we let the optimization reach numerical convergence, where the progress in minimization is minor. In this example, it is usually at about iterations. In the case of non-uniform distribution, we observe that choosing a random initial guess can have an effect on the speed of convergence but has almost no influence on the resulted volume. In other words, we gain numerical evidence for uniqueness. The estimated volume, in this case, is depicted on the left side of Figure 6.
On the other hand, in the case of a uniform distribution, while convergence was typically quicker than in the non-uniform case, the results vary between different initial guesses, indicating the richness of the space of possible solutions. One such solution appears on the right side of Figure 6. This behavior of the optimization solver agrees with our previous knowledge on the ill-posedness of Kam’s method and also with the Jacobian test which shows degree deficiency of the polynomial system defined by the first and second moment under the uniform distribution.
4.8. Comparing volumes using FSC
A commonly used cryo-EM resolution measure is the Fourier shell correlation (FSC) [29]. The FSC measures cross-correlation coefficient between two 3-D volumes over each corresponding shell. That is, given two volumes and , the FSC in a shell is calculated using all voxels on this -th shell:
[TABLE]
Customary, the resolution is determined by a cutoff value. The threshold question is discussed in [64], where in our case since we wish to compare a reconstructed volume against its ground truth, we use the threshold. Since we focus on ab initio modeling, we aim to estimate a low-resolution version of the molecule from the first two moments. Thus, we expect the cutoff to reach a value which ensures a good starting point for a refinement procedure.
4.9. Visual example and the effect of non-uniformity
We next introduce an example for the most realistic scenario of an unknown, in-plane uniform distribution, by inverting the moment map of a real-world structure through minimization of a least-squares cost function (60). In this example, we once again illustrate the feasibility of numerically approaching the solution, without any prior assumption on the volume.
The example is constructed as follows. As the ground truth volume, we once again use EMD-0409, the catalytic subunit of protein kinase A bound to ATP and IP20 [32], as presented at the online cryo-EM data-bank [38]. The map original dimension is voxels. Since we aim to recover a low-resolution model, we reduce complexity and downsample it by a factor of three to . We firstly expand this volume using PSWFs with a band limit chosen as the Nyquist frequency and 3-D truncation parameter (74) of . The full expansion consists of degree , and after truncation to maximize conditioning, as done in Section 3.3, we aim to recover the low degree counterpart up to degree . The moments were calculated with respect to 2-D prescribed accuracy (94) of and in the absence of noise. The volume contributes unknowns to be optimized.
As the ground truth distribution, we choose three different functions: uniform, highly non-uniform and a non-uniform case in-between. The two non-uniform cases are cubic spherical harmonics expansions () and satisfy in-plane invariance and so we present them in Figure 7 as functions on the sphere, together with a histogram to compare and illustrate their “non-uniformness”. The non-uniform distributions add extra unknowns which means that, in total, we optimize unknowns in the cases of non-uniform distribution and only unknowns in the case of uniform distribution.
In the optimization process, we use the limit of the empirical moments (59) () as our input moments. As before, we use a trust-region algorithm, see e.g., [47], which is a gradient-based method. To fix the initialization between the different cases, we start the search with the zero volume. In cases of non-uniform distribution, we provide a random non-uniform distribution to start with. Our method is implemented in MATLAB R2017b, and we calculated the example on a laptop with a 2.9 GHz Intel Core i5 processor and 16 GB 2133 MHz memory.
The result we present next is obtained after iterations of trust-region, each iteration usually uses up to inner iterations to estimate the most accurate step size. The runtime of this example is about minutes for each model, where at this point, our naive implementation does not support any parallelization which potentially can lead to a significant improvement in the total runtime. For example, the evaluation of the second moment and its associated gradient part are related to the leading complexity term as described in Section 4.5. Their implementation is based upon matrix product as seen in the form (35). This part can remarkably benefit from parallel execution. Note that evaluating the PSWF functions, as well as the product Clebsch-Gordan coefficients (which appears in the moments), are all calculated offline as a preprocessing step.
We present a comparison between the different FSC curves for the three cases. As implied by Figure 8, the resolution increases (lower FSC cut) as the non-uniformity becomes more significant. Specifically, with the uniform distribution we obtain merely , where for the two other non-uniform cases we get and as the non-uniformity increases in the examples of Figure 8.
A visual demonstration of the output of the optimization is presented in Figure 9, where we plot side by side the ground truth and three models, from the uniform to the most non-uniform one.
4.10. Recovery from noisy images
We conclude this section with an example of recovering a volume from its noisy projection images. The volume is a mixture of six Gaussians, synthetically designed to have no spatial symmetry. The volume’s size is and its full PSWF expansion is of length , with band limit chosen as the Nyquist frequency and 3-D truncation parameter (74) of . We use an in-plane uniform distribution of rotations, very localized on a degree cone, represented with an expansion length of . The distribution function is shown on Figure 10 and can model a realistic scenario of highly anisotropic viewing directions (see, e.g., [4]). Using the distribution, we generated projection images to obtain observations. These images were then contaminated with noise. The SNR of an image with the noise term is , using the Frobenius norm. The noise was chosen to achieve an average SNR value of . Three examples of clean images and their noisy versions are depicted in Figure 11. As seen in Figure 11, the projections are hardly noticed in the noisy images for the naked eye.
We expand the noisy images using a 2-D PSWF basis, as appears in (14). Then, the coefficients and their double-products are averaged to estimate the first and second moments as in (59). The reconstruction uses the empirical moments to estimate the volume and distribution. For the volume, our gradient-based least-squares algorithm targets its full expansion, which consists of unknowns. The unknown distribution includes unknowns spherical harmonics coefficients. We reached the result we present next very quickly, starting from a random initial guess. It took about iterations of trust-region; each iteration could use up to inner iterations to estimate the most accurate step size. The runtime of this example is less than minutes.
A visual demonstration of the estimated volume is provided in Figure 12. We present the estimation, side by side, to the original volume. As seen in the various pictures, the reconstruction, while not perfect, captures most features and the general shape of the structure. This encouraging result indicates that inverting the moments is possible also from noisy moments and that the mapping has some robustness to small perturbations.
5. Discussion and Conclusion
The method of moments offers an attractive approach for modeling volumes in cryo-EM. This statistical method completely bypasses the estimation of viewing directions by treating them directly as nuisance parameters. The assumption of a non-uniform distribution of viewing angles enables in many cases volume estimation using only the first and second moments of the data. This phenomenon opens the door for fast, single-pass reconstruction algorithms, based on inverting the map from the volume and distribution to the low-order statistics of the projection images.
This paper extended Zvi Kam’s original method of moments for cryo-EM to the setting of a non-uniform distribution of viewing directions. We formulated the reconstruction problem using appropriate discretizations for the images, the volume, and the distribution. Then, we derived moment formulas using properties of the spherical harmonic functions and Wigner matrix entries. Computational algebra was employed to analyze the resulting large-scale system of polynomial equations. The analysis shows the seeming complication of an unknown, non-uniform distribution renders 3-D reconstruction easier than in the uniform case, as now only first and second moments are required to determine a low-resolution expansion of the molecule, up to finitely many solutions. Intermediate cases were treated; remarkably, when the distribution is known and totally non-uniform over , there is an efficient, provable algorithm to invert the first and second moments non-linearly. Additionally, our work addressed several numerical and computational aspects of the method of moments. An implementation of a trust-region method was presented and used to illustrate the advantages of our approach over Kam’s classical approach by numerical experiments involving synthetic volumes.
We regard our work as a definite, albeit initial step toward developing the method of moments for ab initio modeling from experimental datasets. Firstly, even in the synthetic cases considered here, further work on the optimization side is warranted. Variations on our nonlinear cost function that incorporate a pre-conditioner, e.g., (61), could be considered. Secondly, other techniques for large-scale nonlinear least squares optimization should be tried, such as Levenberg-Marquardt [40] or Variable Projection [18], where in the latter one can exploit the linearity in the moments with respect to the distribution, by eliminating out the distribution. Thirdly, to get our method working on images, further effects, such as the CTF and imperfect centering of picked particles, should be incorporated into the moment formulas. Fourthly, accurate covariance estimation in high dimensions requires eigenvalue shrinkage [22], the theory for which may call for a modification in the non-uniform setting.
To simplify our exposition, we have stuck to the asymmetric and homogeneous cases here, although both of these can be relaxed in the method of moments. Specifically, as already noted in Kam’s original paper [33], point symmetries of molecules are reflected in the vanishing of certain expansion coefficients, see also [63]. Therefore, MoM can be reformulated using fewer coefficients for symmetric molecules. This fact, alongside with further improvement of the representation of the distribution, may pave the way for recovery under practical cases of very restricted viewing angles, as reported in literature [4, 26, 44, 59]. At the same time, heterogeneity, at least if it is finite and discrete, can be expressed using a mixture of volumes and a corresponding mixture of moments, see [14, 5]. In future work, computational algebra should be applied to these cases to check whether the first and second moments remain sufficient for unique recovery.
To conclude, we raise one further possibility, in some sense at odds with the message of this paper. In the non-uniform case, we have determined that the first and second moments are sufficient information-theoretically for volume recovery. Nonetheless, the resulting optimization landscape is potentially challenging, due to non-convexity or ill-conditioning. Thus, despite the increased statistical cost of estimating the third moment, it seems worthwhile to ask what can be gained computationally by reprising the third moment in MoM (or at least, using a carefully chosen slice of the third moment). Specifically, we would like to answer this question: can the third moment facilitate more efficient modeling at higher resolution?
Acknowledgments
The authors thank Nicolas Boumal, Peter Bürgisser, Eitan Levin, Dilano Saldin and Yoel Shkolnisky for stimulating conversations, and the anonymous referees for their valuable comments.
This research was supported in parts by Award Number R01GM090200 from the NIGMS, FA9550-17-1-0291 from AFOSR, Simons Foundation MathX Investigator Award, the Simons Collaboration on Algorithms and Geometry, the Moore Foundation Data-Driven Discovery Investigator Award, and NSF BIGDATA Award IIS-1837992. BL’s research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 723991 - CRYOMATH).
Appendix A Prolate Spheroidal Wave Functions
Here we describe key properties of the PSWFs, and propose a method for setting the expansion parameters , , , and . We begin with the three-dimensional PSWFs, where we describe important properties established in the literature [57, 34, 53], and outline our choice for setting and , accordingly. Then, we proceed with a short analogous description for the two-dimensional PSWFs (summarizing results of [57]), and derive a method for choosing and by directly exploiting the fact that the images to be expanded are tomographic projections of a bandlimited and localized volume function (employing our previous representation for the volume function).
A.1. Volume function representation with three-dimensional PSWFs
Let be a square integrable (volume) function on , representing the true underlying electric potential of the molecule, and denote by its three-dimensional Fourier transform. It is common practice to assume that is bandlimited (i.e., is restricted to a ball) while being localized in space. Functions satisfying this property are naturally represented by three-dimensional PSWFs, as detailed next.
We say that the function as -bandlimited if vanishes outside a ball of radius . That is, is -bandlimited if
[TABLE]
where is the unit ball. Among all -bandlimited functions, the three-dimensional PSWFs on [57] are the most energy concentrated in , while constituting an orthonormal system over . Namely, they satisfy
[TABLE]
for , i.e., is the most energy concentrated -bandlimited function, is the most energy concentrated -bandlimited function orthogonal to , and so on. Three-dimensional PSWFs can be obtained as the solutions to the integral equation
[TABLE]
where we denote the solutions (the PSWFs with bandlimit ) as and their corresponding eigenvalues as , where the enumeration over in (68) is replaced with an enumeration over the triplet described below, and the eigenvalues appear in non-increasing ordering with respect to the original enumerate . and together form the eigenfunctions and eigenvalues of (69), with , , and . Furthermore, the functions are orthogonal on both and using the standard inner products on and , respectively, and are dense in both the class of functions and in the class of -bandlimited functions on . In spherical coordinates, the functions agree with the form in the right-hand side of (8), and can be expressed as
[TABLE]
where are the spherical harmonics (see (9)). Numerical evaluation of the three-dimensional PSWFs (in particular of the radial part ) was considered in [39].
From the properties of the three-dimensional PSWFs mentioned above, any volume function can expanded in as
[TABLE]
where denotes complex conjugation. Next, we consider the truncation of the expansion in (71), where it is convenient to bound the resulting truncation error in terms of the assumed spatial localization of . Towards this end, we say that the function is -concentrated if
[TABLE]
Additionally, we define the normalized eigenvalues
[TABLE]
where we mention that , for all triplets , and for every . Now, we propose to set according to
[TABLE]
where is some constant, and set to be the largest for which is defined (i.e., such that the set is non-empty). Correspondingly, the volume function resulting from the truncating the expansion in (71), according to the chosen and , is
[TABLE]
The following proposition bounds the error of approximating by .
Proposition 1**.**
Let be -bandlimited with a unit norm and assume it is -concentrated. Then,
[TABLE]
The proof follows immediately from Theorem 5 in [34] and from our choices of and . It is evident that the approximation error in the right-hand side of (76) can be made arbitrarily small by taking sufficiently small. Furthermore, in the case that is localized in space, i.e., , we can take to be large, possibly even close to , and still get approximation errors sufficiently small for our purposes.
A.1.1. Length of the expansion
Clearly, the number of basis functions taking part in the expansion (75), which is given explicitly by , depends on the number of normalized eigenvalues exceeding . In this respect, the normalized eigenvalues are known to admit the following three distinct regions of behavior (when sorted in descending order). The first is called the “flat region”, where take values very close to , the second is called the “transitional region”, where shift rapidly from values close to to values close to [math], and the third is called the “decay region”, where are very close to [math] and exhibit a super-exponential decay rate. As for the number of basis functions chosen according to (74), the following holds [53]:
[TABLE]
where the first, second, and third terms on the right-hand side of (A.1.1) correspond to the number of normalized eigenvalues exceeding from the flat region, the transitional region, and the decay region of the eigenvalues, respectively. Clearly, the asymptotically dominant term is , which corresponds to the number of terms in the expansion chosen from the flat region. Additionally, we need an extra terms if we take to be small (note that the second term in the right hand-side of (A.1.1) is negative for , meaning that asymptotically we need less than terms for values of close to ). The remaining terms from the decay region are negligible compared to the leading asymptotic terms.
A.1.2. Fourier domain representation
Up to this point, we have shown that three-dimensional PSWFs are naturally adapted for expanding a volume function which is bandlimited and localized in space, where we provided an appropriate error bound (76). However, note that in (8) we actually expand the Fourier transform of the molecular potential. We now connect our previous expansion of with the expansion of its Fourier transform, and show that in fact (and uniquely for PSWFs) the two coincide, in the sense that expanding a function in three-dimensional PSWFs is equivalent to expanding its Fourier transform in three-dimensional PSWFs (after an appropriate scaling and dilation). Let denote the three-dimensional Fourier transform of , then by (69) it is easy to verify that
[TABLE]
where is the indicator function on . It is evident that the Fourier transform of each three-dimensional PSWF is equal to itself up to a constant factor, a dilation by , and a restriction to a ball of radius . Consequently, by taking the Fourier transform of (75) we have
[TABLE]
where
[TABLE]
We conclude this part as follows. Given a bandlimit (typically chosen as the Nyquist frequency corresponding to the projection images’ resolution), we take the radial part of (8) as , where is the indicator function on , and is the radial part of the three-dimensional PSWFs on (the factor ensures that are orthonormal over w.r.t the measure ). Then, setting according to (74) for a given parameter allows for the controlled approximation error (76).
A.2. Projection image representation with two-dimensional PSWFs
In the sequel, we are interested in providing a suitable representation for the projection images of the rotated copies of . By the Fourier slice theorem, the two-dimensional Fourier transforms of such projections are equal to slices from the three-dimensional Fourier transform of (i.e., of ). Therefore, if is -bandlimted, then the projection images are bandlimited to a disk of radius . Additionally, we expect the projection images to be localized in the unit disk if is sufficiently localized in the unit ball. For such projection images, two-dimensional PSWFs are expected to provide a natural representation (see [36]).
We briefly summarize properties of the two-dimensional PSWFs which are used in our context. In essence, the properties of the two-dimensional PSWFs are analogous to those of the three-dimensional PSWFs when replacing the unit ball with the unit disk . Let be a square integrable function on , representing a tomographic projection of . We say that as -bandlimited if its two-dimensional Fourier transform, denoted by , vanishes outside a disk of radius . That is, is -bandlimited if
[TABLE]
Among all -bandlimited functions, the two-dimensional PSWFs on are the most energy concentrated in , that is, they satisfy (68) when replacing with , while constituting an orthonormal system over . The two-dimensional PSWFs were derived and analyzed in [57], and were shown to be the solutions to the integral equation
[TABLE]
We denote the PSWFs with bandlimit as , and their corresponding eigenvalues as , which together form the eigenfunctions and eigenvalues of (82), with , and . Furthermore, the functions are orthogonal on both and using the standard inner products on and , respectively, and are dense in both the class of functions and in the class of -bandlimited functions on . In polar coordinates, the functions agree with the form in the right-hand side of (14), and can be expressed as
[TABLE]
where the eigenfunctions are normalized to have an norm of . Numerical evaluation of the two-dimensional PSWFs was considered in [54].
From the properties of the two-dimensional PSWFs mentioned above, any function can be expanded in as
[TABLE]
Now, considering the truncated expansion
[TABLE]
we are interested in controlling the error
[TABLE]
From (82), the Fourier transform of can be expressed as
[TABLE]
where is the indicator function on , which is analogous to the relation between the three-dimensional PSWFs and their Fourier transforms in (78). Continuing, taking the Fourier transform of (85) gives
[TABLE]
where
[TABLE]
We will now relate 2D basis representation error to that of the 3D basis functions. Comparing the 2D expansion (88) with the relation between 2-D and 3-D coefficients (19), while employing (89) and (80) we have
[TABLE]
for , where for , and
[TABLE]
where is from (20). Using the Cauchy–Schwarz inequality, we can write
[TABLE]
where we also used the fact that is a unitary matrix. Finally, taking and assuming w.l.o.g that , we obtain from (86) and (92) that
[TABLE]
Given a prescribed accuracy , for every we choose to be the smallest integer such that
[TABLE]
which results in
[TABLE]
where are computed by evaluating of (20) via numerical integration (using Gauss-Legendre quadratures). Note that the right-hand side of (94) is determined by the decay rate of in , which is dominated by the decay rate of the the eigenvalues of the two-dimensional PSWFs . Those are known to admit a rapid decay in the form of a super-exponential decay rate following a certain transitional region (see [13, 53]). Hence, if is sufficiently large then (94) can be satisfied for an arbitrarily small with a marginal increase in the number of required terms. Last, we mention that when provided with images sampled on a Cartesian grid, the coefficients can be approximated accurately from the images by fast algorithms [36, 37].
Appendix B Linearizing polynomial maps with the Jacobian matrix
In this section, we describe the linearization technique from computational algebraic geometry we used to obtain the uniqueness results in Tables 1, 2, 3 from Section 3. The first paper to apply algebraic geometry techniques to cryo-EM was [5]. Nevertheless, similar Jacobian tests have been used in other applications such as for testing rigidity in sensor network localization, see e.g., [27] and testing whether a matrix can be completed into a low-rank matrix [55].
To state the method, we fix , let and be projection onto the factors, and consider a polynomial map (that is, each coordinate function is a polynomial on ). While is generally a nonlinear map, its first derivative at is a linear map represented by the Jacobian matrix
[TABLE]
In addition, we define the fiber in by
[TABLE]
and the projected fiber by
[TABLE]
For and , define the specialized fiber by
[TABLE]
Because is described by polynomials, there is a tight relationship between the dimension of fibers of (as algebraic varieties) and the dimension of the kernels of (as linear spaces). This is summarized by the Jacobian tests below. Somewhat remarkably, the linear algebra tests are done at a single point in the domain of , but imply algebraic geometric statements for almost all points in the domain of .
Theorem 6**.**
Suppose it is known that, generically, the fiber, projected fiber and specialized fiber have dimensions , respectively (if we have no such knowledge, then ). Choose particular points , and .
Vanilla Jacobian test:* if , then generic fibers have dimension exactly .*
Projected Jacobian test:* if , then generic projected fibers have dimension exactly .*
Specialized Jacobian test:* if , then generic specialized fibers have dimension exactly .*
Several technical remarks are in order. Firstly, in Theorem 6, the fiber, projected fiber and specialized fiber are affine algebraic varieties and hence a dimension is defined for each of their irreducible components according to [20]. The meaning of the theorem is that each component has dimension exactly , respectively. Crucially, affine algebraic varieties have finitely many components. Thus the theorem implies “finitely many solutions” up to symmetries, if the symmetries give -dimensional ambiguities, respectively. Secondly, “generic” in Theorem 6 is with respect to the Zariski topology. Concretely, there exists some polynomial on such that for all with the implications in the theorem hold. In particular, any property that holds generically holds on a Lebesgue full measure subset of points. Thirdly, the Jacobian ranks in Theorem 6 take on generic values, as each minor of the relevant matrix is a polynomial in .
Theorem 6 states rigorous conclusions if the Jacobian rank tests are passed. On the other hand, if the tests fail for , and were drawn randomly from any continuous distribution on , then by genericity of the Jacobian ranks, with probability , the generic fibers, projected fibers, or specialized fibers of have dimension strictly more than , , or .
We applied the specialized test in subsection 3.2 with , the vanilla and projected tests in subsection 3.3 with and the vanilla test in subsection 3.4 with . The settings of reflect the fact, in the latter two subsections, that the fibers are -sets and we are interested in solutions modulo global rotation. The bounds may be seen as instances of the orbit-stabilizer theorem, see [5, Proposition 4.11]. When the Jacobian rank tests were passed, this meant that, generically, there are only finitely many solutions up to global ambiguities.
In practice, we ran the Jacobian tests in floating-point arithmetic and used SVD for robust rank estimation. Namely, we looked at multiplicative gaps between consecutive singular values, and regarded any gap exceeding a predefined threshold () as evidence that all lower singular values should be regarded as zero. While these computations fall short of a fully rigorous mathematical proof due to the possibility of rounding errors in floating-point arithmetic, it was typically evident which singular values ought to be counted as zero or non-zero.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Emmanuel Abbe, Tamir Bendory, William Leeb, João M. Pereira, Nir Sharon, and Amit Singer. Multireference alignment is easier with an aperiodic translation distribution. IEEE Transactions on Information Theory , 65(6):3565–3584, 2019.
- 2[2] Joakim Andén and Amit Singer. Structural variability from noisy tomographic projections. SIAM Journal on Imaging Sciences , 11(2):1441–1492, 2018.
- 3[3] Larry C. Andrews. Special Functions of Mathematics for Engineers . Mc Graw-Hill New York, 1992.
- 4[4] Philip R. Baldwin and Dmitry Lyumkis. Non-uniformity of projection distributions attenuates resolution in cryo-EM. Progress in Biophysics and Molecular Biology , 2019. In press, available online.
- 5[5] Afonso S. Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Weed, and Alexander S. Wein. Estimation under group actions: recovering orbits from invariants. ar Xiv preprint ar Xiv:1712.10163 v 2 , 2018.
- 6[6] Afonso S. Bandeira, Philippe Rigollet, and Jonathan Weed. Optimal rates of estimation for multi-reference alignment. ar Xiv preprint ar Xiv:1702.08546 v 2 , 2018.
- 7[7] Alex Barnett, Leslie Greengard, Andras Pataki, and Marina Spivak. Rapid solution of the cryo-EM reconstruction problem by frequency marching. SIAM Journal on Imaging Sciences , 10(3):1170–1195, 2017.
- 8[8] Adi Ben-Israel and Thomas N.E. Greville. Generalized Inverses: Theory and Applications . CMS Books in Mathematics. Spring-Verlag New York, 2nd edition, 2003.
