Cross-Modal Fusion Between Data in SAXS and Cryo-EM for Biomolecular Structure Determination
Shengnan Lyu, Christian W\"ulker, Yuqing Pan, Amitesh S. Jayaraman,, Jianhao Zheng, Yilin Cai, Gregory S. Chirikjian

TL;DR
This paper introduces a method that combines SAXS and cryo-EM data to improve biomolecular structure determination, especially in cases with limited projection directions, by leveraging the complementary strengths of both techniques.
Contribution
It presents a novel approach for integrating SAXS with cryo-EM data to address anisotropy and coverage issues in biomolecular imaging.
Findings
SAXS can fill in blind spots in cryo-EM data.
Combined data improves structure resolution in challenging cases.
Method enhances biomolecular structure determination accuracy.
Abstract
Cryo-Electron Microscopy (cryo-EM) has become an extremely powerful method for resolving structural details of large biomolecular complexes. However, challenging problems in single-particle methods remain open because of (1) the low signal-to-noise ratio in EM; and (2) the potential anisotropy and lack of coverage of projection directions relative to the body-fixed coordinate system for some complexes. Whereas (1) is usually addressed by class averaging (and increasingly due to rapid advances in microscope and sensor technology), (2) is an artifact of the mechanics of interaction of biomolecular complexes and the vitrification process. In the absence of tilt series, (2) remains a problem, which is addressed here by supplementing EM data with Small-Angle X-Ray Scattering (SAXS). Whereas SAXS is of relatively low resolution and contains much lower information content than EM, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Electron Microscopy Techniques and Applications · Enzyme Structure and Function · Electron and X-Ray Spectroscopy Techniques
\xpatchcmd
#1##1
Cross-Modal Fusion Between Data in SAXS and Cryo-EM for Biomolecular Structure Determination
Shengnan Lyu
School of Mechanical Engineering and Automation, Beihang University, Beijing, China
Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA
Christian Wülker
Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA
Yuqing Pan
Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA
Amitesh S. Jayaraman
Department of Mechanical Engineering, National University of Singapore, Singapore
Jianhao Zheng
School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Yilin Cai
School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Gregory S. Chirikjian Corresponding author. E-mail: [email protected] Department of Mechanical Engineering, National University of Singapore, Singapore
Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA
Abstract
Cryo-Electron Microscopy (cryo-EM) has become an extremely powerful method for resolving structural details of large biomolecular complexes. However, challenging problems in single-particle methods remain open because of (1) the low signal-to-noise ratio in EM; and (2) the potential anisotropy and lack of coverage of projection directions relative to the body-fixed coordinate system for some complexes. Whereas (1) is usually addressed by class averaging (and increasingly due to rapid advances in microscope and sensor technology), (2) is an artifact of the mechanics of interaction of biomolecular complexes and the vitrification process. In the absence of tilt series, (2) remains a problem, which is addressed here by supplementing EM data with Small-Angle X-Ray Scattering (SAXS). Whereas SAXS is of relatively low resolution and contains much lower information content than EM, we show that it is nevertheless possible to use SAXS to fill in blind spots in EM in difficult cases where the range of projection directions is limited.
1 Introduction
The goal of this paper is the fusion of information from cryo-electron microscopy (cryo EM) and small-angle X-ray scattering (SAXS) in order to exploit the synergies of both techniques. Both of these methods are experimental biophysical techniques used to glean information about the structure of (large) biomolecular complexes in the native state. The data acquisition in these methods can be briefly summarized as follows: in so-called single-particle EM, we are presented with noisy projections of the electron density of a molecular structure from a finite number of a priori unknown different directions relative to its body-fixed frame. The three-dimensional shape is to be reconstructed from these projection data, i. e., EM is a tomographic imaging technique. SAXS data acquisition, on the other hand, can be interpreted as taking the electron density of the structure — or rather the result of a convolution of the density with a reflected version of itself — and subsequent averaging over concentric spheres centered at the origin. This is equivalent to square averaging the Fourier spectrum of the electron density on these spheres. For a detailed discussion of the mathematical aspects of SAXS and EM data acquisition, we refer the reader to Dong et al. [2015a]; Afsari et al. [2015].
The information content and resolution of EM projections, either after class averaging or due to new advances in imaging technology, are both significantly higher than SAXS. Nevertheless, we make the case that there are potential benefits to fuse these information sources.
The main tool that we use in this work to fuse the information from EM and SAXS is the well-known Fourier slice theorem. In two dimensions, this theorem states that the results of the following two calculations are equal:
Taking a two-dimensional function , projecting it onto some (one-dimensional) line, and doing a Fourier transform of that projection along that line. 2. 2.
Taking the function , performing a two-dimensional Fourier transform, and then slicing it through its origin with a line parallel to the projection line mentioned above.
This theorem directly generalizes into higher dimensions; particularly, it also holds in the three-dimensional case.
The starting point of this work is the Fourier slice theorem: we can interpret EM data as spectral data given on a finite number of planes passing through the origin as parts of the Fourier spectrum of the electron density of the molecular structure of interest. The main issue here is that there might be significantly large gaps between these planes in the spectral EM data, because the number of directions in which projection data acquired in EM can be very limited. This makes reconstruction via the two-dimensional inverse Fourier transform potentially problematic. As indicated above, SAXS adds to the spectral EM data, for here the data can be seen as constant values on concentric spheres centered at the origin of the Fourier space, resulting from square averaging of the spectrum over these spheres. Therefore, both EM and SAXS data can be considered in this context as partial information on the Fourier spectrum of the electron density to be reconstructed.
The remainder of this paper is organized as follows: in Section 3, we derive our technique for SAXS-EM data fusion and in Section 4 we explore some properties of the multiple admissible solutions for the reconstructed data of the SAXS-EM fusion procedure. Section 5 presents numerical results, validating the practical applicability of our method. In Section 6, we discuss our results and give an outlook on future developments.
2 Literature Review
Macromolecular structure determination is of central importance in biology because of the close relationship between shape and function. As the fields of biology and biophysics increasingly focus on the study of cellular phenomena, understanding the structure and motion of large biomolecular complexes is becoming critical. Large biomolecular complexes bridge the length scales of single-molecule studies and cellular phenomena.
Two of the main methods used in this context are small-angle X-ray scattering (SAXS) Stuhrmann [1970]; Feigin et al. [1987]; Blanchet and Svergun [2013]; Svergun and Koch [2003]; Dong et al. [2015b] and cryo-electron microscopy (cryo EM) Crowther et al. [1970]; Van Heel and Frank [1981]; Penczek et al. [1992]; Frank [2006]; Wang and Sigworth [2006]; Scheres [2012]. SAXS is known for its relatively low resolution and low information content in comparison to cryo-EM. Nevertheless, the information provided in these methods can be used to complement each other provided that the structural features in both experimental conditions are compatible Kim et al. [2017].
The data acquisition processes in these methods are also quite different. A SAXS experiment is fast to perform (given access to a synchrotron) and requires relatively little sample preparation. But it provides very low resolution data. In contrast, a cryo-EM experiment can usually be performed locally on microscopes owned by most research-intensive universities, and provides large amounts of data. But EM needs more effort for sample preparation and moreover, post-processing computations are necessary to classify, denoise and/or class-average images Bhamre et al. [2016]; Park and Chirikjian [2014]; Park et al. [2011]; Penczek [2002]; Schatz and Van Heel [1990]; Sigworth [1998]. After this, the images are aligned to yield a 3D density map at high resolution Penczek et al. [1996]; Scheres et al. [2005]; Shkolnisky and Singer [2012]; Singer et al. [2011]; Wang and Sigworth [2006].
Usually either SAXS or cryo-EM is used, but here we investigate the “fusion” of structural information obtained from SAXS and EM experiments. Although the type of data and the data acquisition processes in these methods are quite different, they do share some common features. Both SAXS and EM differ from crystallography in that no crystallization is needed (at the expense of lower resolution). The advantage is that finding appropriate crystallization conditions in many cases is either a very lengthy process or the process may capture the molecules in conditions that are not biologically relevant.
That said, in SAXS experiments, the biomolecular complexes of interest are in solution and are exposed to X-ray beams, which leads to the scattering of X-ray photons. Unlike in X-ray crystallography where the macromolecules are regularly positioned and oriented, in SAXS the molecules can move and rotate randomly in solution. Hence, roughly speaking, the low resolution in SAXS can be attributed to the fact that the SAXS data is the spherical average of the scattering pattern of the complex under study.
In cryo-EM, similar to standard tomography, we obtain numerous two-dimensional (2D) projections of a 3D complex. A specimen grid, with the molecules of interest in it, is prepared and exposed to high-energy electron beams to obtain millions of projected 2D images. However, in contrast to tomography, the projections are at random (unknown) directions; moreover, the projections are extremely noisy. As a result, the 3D volume reconstruction in cryo-EM involves complicated postprocessing and still the resolution is not very high as compared to crystallographic structures (although it is much higher than SAXS).
Both SAXS and EM have been successfully used for the structural study of large biomolecular complexes. However, as explained earlier, they have their own disadvantages for the study of function–structure relationships. To better investigate the relationship between the function and the shape of a biomolecular complex, it may be desirable to fuse the information from these two experimental data modalities, which is the goal of this paper.
3 Theory
In this section, we elaborate on the theory of SAXS-EM data fusion in the two-dimensional and three-dimensional case. We consider the two-dimensional case first, because we will later see that our idea directly generalizes to the three-dimensional case, which simplifies the derivation.
3.1 The two-dimensional case (For conceptualization)
Though proteins are 3D structures, before going into the mathematical details of 3D case, we first develop a conceptual framework illustrated with a 2D toy model.
In the two-dimensional case, the task is to reconstruct the unknown electron density of a fictitious 2D structure of interest from partial Fourier data. The idea of information fusion is to combine the information from both EM and SAXS to fill in the missing values of the Fourier spectrum in the plane, so that can be reconstructed via a two-dimensional inverse Fourier transform. Figure 1 shows the basic two-dimensional setting. In Fourier space, from EM and the Fourier slice theorem, the values of the spectrum on each line passing through the origin are known. On the other hand, the values of the square average of on each of the concentric circles are known from SAXS. For simplicity, Fig. 1 shows only one such circle, with radius equal to one. Many such circles will be used to cover the plane in Fourier space.
Switching to polar coordinates, and , we see that the value of at the intersection between each line and the circle is known, because from the Fourier slice theorem, the EM data can be seen as the values of on lines passing through the origin, perpendicular to the respective projection direction (in 3D the slice would be a plane and the projection direction would be normal to this plane). The sought-after density function has the Fourier transform
[TABLE]
The original function can be reconstructed from its Fourier transform via the inverse transform,
[TABLE]
In polar coordinates, the frequency vector can be written as , where is the Euclidean 2-norm of . The plane is divided into an infinite number of circles with radius . Without loss of generality, let us choose in Fig. 1 the unit circle with radius , so that . In this case, we can use the short hand , so that describes a function on the unit circle in the -plane. Since , we get directly from the definition of the Fourier transform
[TABLE]
From the various projections in EM, only the values of the function along specific lines in is obtained, and these lines are not uniformly spaced with respect to their angle. Our goal is to interpolate the data so as to match the projection data of the experiment, while also getting meaningful “in-between” values of the function on the whole circle. The SAXS experiment provides the square average of the function along the circle, which is the information that we shall incorporate in the interpolation process.
For the unit circle with radius , the 2D version of the SAXS experiment gives the single value
[TABLE]
On the other hand, data and at specific points , , on the unit circle are provided by EM, where is the number of different projection angles. We approximate the function with a truncated Fourier series (i. e., a trigonometric polynomial),
[TABLE]
where
[TABLE]
are the (classical) Fourier coefficients of the function on the unit circle , and is the approximation order. We note that
[TABLE]
due to (2).
Now we substitute (4) into (3), and derive
[TABLE]
where is the Kronecker delta symbol, which is equal to one if and equal to zero otherwise. Equation (7) is an approximation to the well-known Parseval equality. Now let us bring this equation into matrix-vector notation. To this end, we introduce the vectors
[TABLE]
[TABLE]
With this vector, (7) attains the form
[TABLE]
where is the Hermitian transpose of .
Now turning back to (4), we find that
[TABLE]
where
[TABLE]
It can be seen from (5) that the unknown is real. We bring this unknown quantity to the left-hand side of (8),
[TABLE]
where is the vector of length containing only ones,
[TABLE]
[TABLE]
Suppose the matrix is invertible (by imposing the constraint ), then
[TABLE]
Substituting (11) into (7), we finally get
[TABLE]
This is a quadratic equation in , so we can solve it for first. Subsequently, we can solve (11) for , so that all the Fourier coefficients are found. This in turn allows for interpolation of the given spectral values on the circle. Note that the symmetry relations (2) and (6) can be exploited here. Alternatively, an adequate algorithm for solving the system (11) is the non-equispaced fast Fourier Transform (NFFT) of Potts et al. [2001]. This algorithm and its so-called adjoint could also be used in the first step in an iterative scheme for solving the quadratic equation (12) for . Performing the above-described procedure on a suitable number of concentric circles, we can potentially interpolate the complete spectrum of . In any case, from the interpolated spectral data, we can reconstruct the sought-after density from its spectrum via a discretized inverse Fourier transform, cf. (1). This last step can be carried out using either a classical inverse FFT or the inverse NFFT of Potts et al., depending on the points at which the spectrum is interpolated.
Before we show how the above method for SAXS-EM data fusion generalizes to the three-dimensional case, an important remark is in order. Of course, the above matrix will be invertible if and only if and if the projection angles are pairwise different from each other. If the number of EM projection angles is smaller than the approximation order , than the Moore-Penrose pseudo inverse of the matrix , or any other method for solving the linear system (9) could be used. However, it clearly makes sense to choose the approximation order such that it matches , so that the above-described method for SAXS-EM data fusion results in a quadratic and invertible matrix . If some of the EM projection angles are very similar to each other, it could be numerically more stable to exclude the “redundant” angles, and to choose the approximation order accordingly. The over-determined case is neither of practical nor of theoretical interest for here the information of SAXS would theoretically already be included in the EM data.
Due to the geometric properties of the two-dimensional unit sphere, which are different from those of the three-dimensional unit sphere, the situation in the three-dimensional case is slightly more complicated than this, as we will see later. But the same conceptual framework applies.
3.2 Numerical illustration - Fusion on a unit circle
In this subsection, the basic principle behind SAXS-EM data fusion is illustrated numerically. To show the general concept, the experiment is performed in real space. But the calculation method can be implemented in Fourier space as well.
The following function on a unit circle is used as a testing model,
[TABLE]
In the experiment, it is supposed that values of the function are known when , which is used as EM data.
The SAXS information can then be calculated as
[TABLE]
Both the EM reconstruction and SAXS-EM fusion reconstruction have been performed. Results are shown in Fig. 2, in which the two dashed lines (in red and green) are results from the SAXS-EM fusion, the black dotted line shows the EM result, and the blue dashed line with black dots is the original function.
It can be seen from the figure, the SAXS-EM fusion technique provides two very different results. One fits the original function pretty well, but the other one goes far from the ground truth. The two result phenomenon can be explained since (12) is a quadratic equation solving for , which yields two solutions. However, as shown in the figure, one of the two solutions is definitely wrong.
In this 1D example, the value of the function can be used as the filter. Since the original function indicates the density of a model must be positive in each point, the red dash line can be easily eliminated. In the 2D and 3D cases, the minimum mean square error has been adopted to choose the solution which gives a more stable intensity value, . However, this is a computationally intensive procedure and the origin of the two solutions obtained as well as the filtering steps will be explained in detail later in Section 4.
3.3 The three-dimensional case
In single-particle cryo EM, we have many copies of the same biomolecular structure (each called a “particle”) oriented in many different ways. If is the density function of the particle as described in its own body-fixed frame, and if is the rotation matrix describing the orientation of the th particle relative to the laboratory frame, then the projection along the laboratory -axis will be
[TABLE]
where , while are respectively the canonical unit vectors in -, -, and -direction.
This means that in the body-fixed frame of the th particle, the projection direction is
[TABLE]
From the Fourier slice theorem, this then means that the two-dimensional Fourier transform of the projection is a slice of the three-dimensional Fourier transform,
[TABLE]
of the density function in the plane through the origin with normal . Explicitly, the Fourier slice theorem here reads
[TABLE]
The density function can be recovered from its Fourier transform via the inverse Fourier transform,
[TABLE]
Let us assume that an existing EM reconstruction method has determined , where is the total number of particles. Our goal is to fuse the experimental EM information for given in the left-hand-side of (14) with the SAXS experimental measurements. The SAXS data is the square average of the Fourier transform over spheres centered at the origin,
[TABLE]
where is the two-dimensional unit sphere in .
Our approach for SAXS-EM data fusion in the three-dimensional case will be a direct generalization of the two-dimensional case discussed in the previous section. We wish to reconstruct by viewing the Fourier space as consisting of an infinite number of concentric spheres. On each of these spheres will be a certain number of so-called great circles, arising from the intersection of a plane containing the origin with the respective sphere, cf. Fig. 3. The values of the spectrum on these great circles are provided by EM data, cf. (14). The SAXS information (16) will serve to improve the reconstruction in the potentially large “gaps” between the great circles.
Apart from that, the main difference between the three-dimensional and the planar case is that instead of the classical Fourier basis , we will now employ spherical harmonics that we briefly review below.
3.3.1 A brief review of spherical harmonics
On the two-dimensional unit sphere in , let us introduce spherical coordinates and such that for any point on the unit sphere it holds
[TABLE]
The angle is called the polar angle, while is referred to as the azimuthal angle. The integral of any integrable function can be computed via
[TABLE]
If a function is square-integrable over , i. e.,
[TABLE]
then it can be expanded in terms of the spherical harmonics. These are defined as
[TABLE]
where
[TABLE]
is an associated Legenrde polynomial. This is possible because the spherical harmonics constitute an orthonormal basis for , the set of all square-integrable functions on the sphere. Such an expansion reads as
[TABLE]
where
[TABLE]
are the generalized, spherical Fourier coefficients of the function . Equality in (17) holds in the sense, but in general not pointwise. However, as is commonly done, we can use a truncated version of the Fourier series as a pointwise approximation to the function of interest, which will be smooth in general.
Without going into too much detail, we note that the spherical harmonics have nice properties under rotation. Indeed, for a given rotation matrix we find that
[TABLE]
where are the well-studied Wigner-D functions (see Biedenharn and Louck [1981]; Chirikjian and Kyatkin [2016] for example). In particular, this means that a rotated spherical harmonic can be written as a linear combination of spherical harmonics of the same degree.
3.3.2 Data fusion
As in the two-dimensional case, without loss of generality, let us confine ourselves to the unit sphere , i. e., the sphere with radius in . We can approximate the sought-after density spectrum restricted to the unit sphere with a truncated spherical Fourier series,
[TABLE]
where is the approximation order, and are the spherical Fourier coefficients of , cf. (18).
The SAXS experiment gives
[TABLE]
Let us assume that the EM data are given on the unit sphere at points . Note that these points will lie on a certain number of great circles, cf. Fig. 3. Analogous to (2), in the three-dimensional case we have that
[TABLE]
Note that in spherical coordinates, parity transforms a point with coordinates to . From (20) and with the well-known relation , it can further be derived that (cf. (6))
[TABLE]
Let us introduce the EM-data vector
[TABLE]
as well as the sought-after frequency vector
[TABLE]
The SAXS data (19) thus attain the form
[TABLE]
Now we can state the EM reconstruction problem in matrix-vector notation as
[TABLE]
where
[TABLE]
When , from (21), we have and so is real. Bringing this unknown to the left-hand side while noting that , the system (23) transforms into
[TABLE]
where is the vector of length containing only ones,
[TABLE]
[TABLE]
With the Moore-Penrose pseudo inverse of , the system (24) can be solved as
[TABLE]
Inserting this equation into (22) then yields
[TABLE]
As in the two-dimensional case, this is a quadratic equation with respect to the first unknown Fourier coefficient . After solving this quadratic equation, the remaining Fourier coefficients can be determined by solving the linear system (25) subject to (20) and (21), so that all the Fourier coefficients are found by inverting (23), which we do via the pseduo inverse based on the SVD. An alternative tool for this task is the well-known fast non-equispaced spherical Fourier transform of Kunis and Potts [2003]. This fast algorithm and its adjoint can also be used in the first step, in which the quadratic equation (26) would then be solved for with a suitable iterative technique. Once the spherical Fourier coefficients are determined, the spectral EM data can be interpolated on the sphere. After this is done on a suitable number of concentric spheres, the spectrum of can potentially be interpolated in . The sought-after density function can now be reconstructed from the interpolated spectral data via a discretized inverse Fourier transform, cf. (15).
To close this section, let us comment on the relation between the number of EM data points considered on the unit sphere above and the approximation order . In the two-dimensional case, we saw that this relation was very clear: As long as the different projection angles in EM are not too similar to each other, the approximation order should be chosen to be equal to the number of projection angles , for this will result in a quadratic and invertible Fourier matrix in (9). As mentioned above, the situation in the three-dimensional case is slightly more complicated than that. In particular, it would in general not be sufficient to choose the approximation order equal to , although this would mean that we had as many data points as spherical Fourier coefficients that we wish to reconstruct (). The reason for this is that the EM data will always lie on great circles (again, cf. Fig. 3), which is an inherent problem in EM imaging if reconstruction is performed using concentric spheres. Indeed, it is well known from constructive approximation theory on the sphere that pairwise different sample points on one great circle are not sufficient for reconstructing a spherical polynomial of degree with Fourier coefficients (see Freeden et al. [1998], for instance). The situation is slightly relaxed by the fact that there will be more than one great circle in practical applications. The number of different projection angles, however, can be quite limited, and there might be large gaps between the projection planes, where information is missing. This makes including information available from SAXS even more desirable.
3.3.3 Numerical illustration - Fusion on a unit sphere
This numerical experiment shows the difference between the EM reconstruction and the SAXS-EM reconstruction. The aim is to illustrate that when additional data from SAXS experiments is supplemented, the reconstruction is better for the three-dimensional case. The calculation is performed in Fourier space directly.
The 3D function used in this experiment is
[TABLE]
where , .
The value of the function on a unit sphere () is shown in Fig. 4. In the numerical experiment, 20 projections of the function were randomly generated. On each projection, which is a unit sphere, 50 evenly distributed points are selected with known values. This constitutes the EM data.
The roots that the quadratic equation in (26) is expected to generate can be compared with an analytical solution calculated using (18),
[TABLE]
and hence, .
Figure 6 shows the reconstruction result using only the EM data. Qualitatively the reconstruction creates fictitious spots on the sphere and does not reproduce the peaks of the Gaussian correctly. This is reflected in the absolute error plot adjoining it. The reconstruction result from SAXS and EM fusion and the corresponding absolute error are shown in Fig. 5.
By comparing Fig. 6 and Fig. 5, the reconstruction error has been reduced by fusing SAXS data. Furthermore, it is worth to note that the coarser the EM data sampling, the greater the reconstruction error. In such instances, the role of SAXS data becomes more valuable.
4 Structure and Selection of the Quadratic Roots
In this section, we describe in greater detail the origin, underlying structure as well as the method of selection employed to choose between the roots of the quadratic equations in (12) and (26).
Let represent the coefficient of the zero-order term in the Fourier or spherical harmonic expansion (i. e. and respectively). The two roots of the quadratic equation, and , and they can be used to derive two different sets of from (11) and (25): and . From the intensity constraint ((19) for 2D and (22) for 3D), we observe that for any that satisfies the constraint, a transformed coefficient vector, where is a unitary matrix (i. e. ) also satisfies the constraint. Hence, we can write . We now concentrate on the 2D case, although a similar analysis can be performed for the 3D scenario as well.
4.1 Origin of the roots in 2D
We let denote the dimension of the vector (the superscript will henceforth be omitted and the relationship between the two solutions be made clear through the unitary matrix) and let be the dimension of the EM data, . As we proceed to explain, the difference of between the two dimensions leaves the system under-determined and therefore allows the intensity constraint to close the system of equations. If the difference between the two dimensions were larger, one would require the Moore-Penrose pseudo inverse to approximate the solutions (as is performed in the three-dimensional scenario). We proceed by making some observations regarding the under-determined nature of the system of equations. Since and both satisfy equation (8),
[TABLE]
This implies that,
[TABLE]
Hence, the matrix transforms a non-zero such that it finds itself in the null space of . This result implies that has linearly independent rows but linearly dependent columns. This is a consequence of finite discretization: EM data that is known at some points can allow for more than one interpolation of the Fourier transform between the data points. The quadratic nature of the intensity constraint narrows the space of admissible solutions (that satisfy the discrete set of equations) to two. Roots generated after gradual refinement of the EM data with more slices can be expected to converge towards the true root. A related method to select the true root without needing refined EM data would be to selectively block data obtained at various angles and evaluate the roots. The spurious root would be expected to fluctuate significantly whereas the true root would be fairly stable throughout this blocking and calculation procedure. However such a method is also computationally intensive especially in two or three dimensional cases. In such scenarios other techniques of root selection would be important, and these would be highlighted later.
4.2 Origin of the roots in 3D
A similar procedure can be performed for 3D although the situation is more complicated. We let denote the dimension of the vector and let be the dimension of the EM data, . Again the set of equations would be under-determined since due to the interpolation conditions on a sphere, data points are not sufficient to interpolate an degree spherical harmonic on the spherical surface and we would obtain an interpolation of lower “effective” order. Recognizing this fact, we yet again have a system that allows two roots to be connected by a unitary transformation such that the difference between the two roots falls in the null space of the matrix, in (23). Note that the fundamental assumption is that the system of equations is originally under-determined and allows for the (approximate) evaluation of a root. As mentioned earlier, over-determined systems would not be of practical interest.
4.3 Types of roots and relationship with intensity
The quadratic equation (12) in the two-dimensional problem can be expanded as,
[TABLE]
where is a Hermitian matrix that contains information about the distribution of the EM sampled points. The roots of this equation are,
[TABLE]
We also observe that the intensity only appears in the discriminant, which is proportional to the term under the square root in (30). In fact, since using the positive semi-definite property of the Hermitian matrix, , the nature of roots is solely controlled by the sign of .
Now, we define as the following expression,
[TABLE]
Here, can be interpreted as an intensity since it has a similar form as the original intensity constraint in (22). Moreover, it will be shown that is in fact the lowest possible intensity that the EM data can admit, and is associated with the Fourier interpolation with the lowest norm that is used to fit the EM data. All values of will allow for two different roots to exist. If then the roots are complex whereas when the roots are real. In fact, the difference between the two roots would be proportional to . The constant of proportionality would be related to the sampling of angles in the EM data since the scalar is related to the angular spacing between various sampling points (and will have a considerably simple structure if the samples are uniformly spaced). A similar form of exists for the three-dimensional problem where we have,
[TABLE]
where and is now expressed in terms of the pseudo-inverse.
The special case of complex conjugate roots:
Since the matrix is Hermitian, the coefficients of the quadratic equation (12) are real (a similar conclusion can also be drawn for the three-dimensional case in (26)). The three possible solutions of would be repeated real roots, distinct real roots and a complex conjugate pair. Here, we make a remark on the significance of the complex conjugate pair of roots. A complex conjugate pair of roots would imply that . Relating the two vectors by the unitary matrix transformation, we get,
[TABLE]
Multiplying both sides by the inverse of the Hermitian matrix, and taking the conjugate of both sides,
[TABLE]
Equating the right hand side of (33) and (34), we obtain for a non-zero . We also point out that a symmetric unitary matrix has many interesting properties, especially under the decomposition, for symmetric real matrices, and . We observe from that and . Particularly, if we define , then . However, the existence of complex roots violates the important symmetry condition in equation (6) and its corresponding form in the three-dimensional case (21). A complex root suggests that no real value of can satisfy both the intensity as well as the EM data fitting constraint in the given set of EM data. Complex roots (since it appears in pairs) are yet again symptoms of discretisation introduced by the finite set of EM data and will disappear with finer sampling. A simple numerical technique to deal with complex roots would be to find the closest real root to approximate a given complex root that in this case would be the real component of the complex root since it corresponds to the minimum point of the parabola representing the quadratic expression. However it must be noted that by removing the imaginary component of the root, one also removes the information provided by the SAXS fusion data (since the intensity appears in the discriminant, which finds itself only in the imaginary component of the complex root). In such scenarios, the SAXS-EM fusion would degenerate to a reconstruction from EM data alone. The situations when the SAXS information fusion degenerates to an EM reconstruction will be explained in the following section in greater detail.
4.4 Lower bound for SAXS intensity in SAXS-EM fusion
In this subsection, we aim to establish a connection between the results obtained from the SAXS-EM fusion and those obtained from EM data alone. By doing so, it is possible to shed light on the characteristics of the roots obtained in (30).
Let a vector, , be evaluated in the EM experiment such that,
[TABLE]
Here, is a matrix that corresponds to (8) in the two-dimensional case or (23) in the three-dimensional case. It is important to point out that since is usually under-determined, (35) would have an infinite number of solutions and without any other constraint, the Moore-Penrose pseudo inverse would select the solution, with the lowest Euclidean norm. The value of the norm is proportional to the intensity obtained from the EM calculations, i. e.,
[TABLE]
Substituting (35) in (36) and expressing as (which is true for under-determined systems), we obtain,
[TABLE]
using the Hermitian property of . The matrix, , can be related to the matrix defined in equations (10) and (24) as where is a column vector of ones. The inverse of appears in (37) and the formula derived by Sherman and Morrison [1950] allows us to express,
[TABLE]
Here tr represents the trace operator. Substituting in (4.4) and using the result in (37), we see that,
[TABLE]
where was defined in (32).
The result has important implications in understanding the usefulness of the SAXS intensity information in the fusion procedure. Firstly, we emphasize that sets a minimum bound to the intensity value that can be used to set a constraint on a given set of EM data. When the SAXS intensity information is greater than this value, i. e. , one obtains two real solutions. As shown earlier, these solutions are related by a unitary transformation and geometrically they can be visualized as two points on a constant intensity sphere in space that can be related through reflections or rotations. When the SAXS-EM fusion degenerates to an EM reconstruction alone and the SAXS intensity information adds no further information to the EM data. Geometrically, the solution space collapses to a point and the only unitary transformation that can exist is the identity transformation. When , it is not possible to fuse the SAXS information with the given set of EM data without violating some of the constraints (in this case, the symmetry conditions). In practical usage, the imaginary term is dropped such that we yet again obtain the situation where and the SAXS-EM fusion degenerates to an EM reconstruction. Hence, SAXS intensity information can only add additional information to the results from the EM experiment when .
We also make a final remark that when the EM data is sampled over concentric circles or spheres, the lower bound of intensity applies separately for each radius. In overall, it is reasonable to expect that for some, but not necessarily all, radii the SAXS-EM fusion would degenerate to an EM reconstruction. Additionally, this result as of yet makes no prediction on whether the SAXS-EM reconstruction is “better” than the EM reconstruction. This would depend on other factors as well such as sampling, form of function to be reconstructed and order of the harmonic interpolation and on the conditioning of the relevant matrices.
4.5 Root Selection Procedure
In the development of the numerical algorithm to implement SAXS-EM fusion, the important question of how to select the “correct” root needs to be answered. For the case of complex and repeated roots, this is not a problem since in both cases the real part of the two roots are the same. However, for the case of real but distinct roots, an additional root selection method needs to be devised. The underlying concept behind the creation of such a method is to observe that the spurious root would be more highly dependent on discretiszation parameters such as the sample points and their spacing. On the other hand, the true root would tend to be stable as the mesh is refined or the sample points modified. For instance, consider the Fig. 7 that shows the variations of the two roots with increasing refinement of samples of EM data for the problem in Section 3.3.3: roughly, the higher the number of great circles sampled, the lesser the difference in the absolute values of the two roots. Both roots converge to the analytic solution of zero as the number of great circles increases. However, the convergence is not monotonic since the great circles are randomly chosen for each step of sample refinement. Hence, the fluctuations depend on the sensitivity of a given root to the set of sample points; the greater the fluctuation the greater the likelihood that a given root can be considered as “spurious”. The yellow markers indicate the root that is ultimately selected by the algorithm to perform the SAXS-EM fusion for the given set of data. In this specific case, root 1 appears to experience fluctuations of lower amplitude and is selected in preference to root 2 for most part. The selection was carried out by a direct summation method, which will be explained further in the subsequent paragraphs.
The problem of selecting the correct root is complicated by the fact that with the discrete set of EM data, one does not usually have a priori knowledge of the final reconstructed function. In some cases, it may be possible to assume that the final reconstruction should be fairly smooth and hence spurious Fourier interpolations that fluctuate greatly over the domain of EM data can be eliminated. This can be done by integrating (5) or (18) numerically, using the EM data, to obtain an approximate value for or respectively. This value forms a guess or a “selector” and would then be compared with the roots of the quadratic equation and the value with a lower absolute difference is chosen. This working assumption would then be that the numerical integration of the original EM data to obtain the zeroth order Fourier coefficient would be similar to the numerical integration of the interpolated data and this may be true, roughly, if the fluctuations have a period greater than the average spacing between the discrete EM data samples.
There are many ways to carry out this integration. One way would be to interpolate the original EM data (using a polynomial for instance) over the sphere of interest and then evaluate the integrals in (5) and (18) through a quadrature rule. Another method would be to approximate the integral as a summation over the EM data as the following in three-dimensions,
[TABLE]
and as,
[TABLE]
for the two-dimensional case.
Here, there are points in the EM data set and and represent an average grid spacing in the polar and azimuthal directions respectively. The following plot presents the convergence rates of the roots along with that of the selector, again applied to the example in Section 3.3.3.
From the figure it is clear that interpolating the EM data over the sphere followed by integration to obtain the zeroth order Fourier series coefficient is worse as a selector as opposed to approximating the root through a direct sum method in (40) or (41). The analytic solution for the root is zero for this particular example and it can be seen that the selector evaluated through an interpolation followed by numerical integration converges too slowly with sample refinement. The roots are therefore selected through (40) and plotted as the green circles. The root evaluated directly from the EM data is also plotted; this corresponds to the value of that results in a unique real solution for (26). Since we know from Section 4.4 that this solution is the average of the two roots evaluated from SAXS-EM fusion, the EM root would not select one root in the preference of another as both SAXS-EM fusion roots differ by the same absolute magnitude from it; hence, the EM solution forms an unbiased reference to compare various root selection strategies against each other.
Another approach to selecting roots is to repeatedly block angles of EM information, one data point at a time, and calculate the roots for the angles that remain. This is repeated for all angles in the set of EM data and the root whose value is nearly stable in this blocking and calculating procedure is selected. Although this does allow for a good reconstruction, as shown in Section 5.1, it is a computationally intensive procedure and was replaced by (40) and (41) for more complicated shapes such as the “smiley” and “minion” tested later.
5 Numerical Experiments
5.1 Sum of two co-origin Gaussian functions
An experiment involving the reconstruction of a sum of two two-dimensional Gaussian functions is performed in this section. The original function in real space is displayed in Figure 9 and has the following expression,
[TABLE]
where , , and .
The Fourier transform of this function is the sum of another two Gaussian functions, as shown in Fig. 10. The analytical expression for this Fourier transform would then be,
[TABLE]
A comparison of the Fourier transform of the original density with that obtained by SAXS-EM fusion and EM alone with the projection angle gap is shown in Fig. 11. The known EM data is evenly distributed in the meshed range, while the unmeshed region indicates the gap in the data.
Figure 12 shows the result with the projection angle gap [60, 110]. As can be seen from the figure, SAXS-EM fusion gives results close to the original function while EM alone cannot reconstruct the values well. This result therefore shows a success of the synergism between cryo-EM and SAXS in the 2D case. More tests were run with different original functions, input angles, regions and widths of gaps, showing consistent results.
5.2 2D smiley face
5.2.1 Numerical experiment procedure
A two-dimensional smiley face was constructed using (47) and substituting in the definition of the Gaussian in (45). Here, the mean vector, would represent the two-dimensional coordinates of a point on a discretized smiley face, with each coordinate in the smiley face mapping to a and a corresponding Gaussian, . The weighting coefficient was adjusted for each Gaussian to modify the amplitude of each function and a smiley face was created as shown in Fig. 13. The analytical Fourier transform, was evaluated for each Gaussian using (46) with and summed using (48).
Since the SAXS experiment provides the square average of the function along circles with different radii, we let ; the Fourier transform plotted in polar coordinates is shown in Fig. 15. Then, the SAXS data can be obtained by integrating the square of the amplitude of the analytical Fourier transform along each circle with radius . Similar to (13), the SAXS information is,
[TABLE]
In our numerical experiment, the Fourier transform of the original smiley face is discretized into 65 circles whose radii are evenly spaced. Along each circle, 65 points are distributed evenly. Following this, a gap is introduced in the projection directions in order to simulate the EM experiment. All numerical simulations were performed with a MATLAB R2018b, i5-4690 CPU.
Both the EM reconstruction and SAXS-EM fusion reconstruction were completed along each circle where a quadratic (29) was solved. Equation (30) gives the analytical expression of the root. But instead of selecting the true root by selectively blocking data obtained at various angles and choosing the fairly stable one, we obtain an approximate value of the root calculated by (41) as a reference to to select root. The one closer to this approximated value is chosen as the right root. On the other hand, if the roots are complex, we only take the real part of the root. During the numerical experiment, when the radius in Fourier space gets larger, the root tends to transform from real to complex, introducing relatively large errors when compared with the analytical solution for the true roots. But in a real EM experiment, we have no way to do such a comparison with an analytical result because only discrete data from EM is provided. Therefore, the root solved from the quadratic equation that is closest to the approximated selecting root is the one that we use to fuse the SAXS and EM information.
5.2.2 Results and discussion
Projection Angle Gap
Figure 15 provides the ground truth of the smiley face reconstruction using complete EM data alone (i. e. there are no missing wedges in the EM data). Figure 16 shows the result with the projection angle gap . As can be seen from the figure, SAXS-EM fusion restores most information in the projection gap that EM alone cannot. EM data alone also gives a false restoration at the edge of the region. These high frequency artefacts in Fourier space correspond to sharp discontinuities or changes in the reconstructed smiley face in real space. The presence of such artefacts can also lower the resolution of the final reconstructed smiley face.
With the results provided by the two methods above, a real smiley face can be reconstructed by interpolating from the discrete data in polar coordinates to Cartesian coordinates to set-up the data for an inverse FFT. To improve the accuracy of coordinate transformation, oversampling is used here to increase the density of radius and angle distribution. Based on the intensive data, two-dimensional cubic spline interpolation is used to evaluate the value at each point given in the Cartesian coordinate. The output domain of the polar to Cartesian transformation and interpolation is a square whose side length equals to the diameter of the biggest circle that can be drawn in the polar coordinates; outside this circle, the Fourier transform values are set to zero, consistent with the analytical condition.
Figure 17 gives the results and comparison of reconstruction of the smiley face obtained by SAXS-EM fusion and EM alone. In Fourier space, the EM reconstruction adds two more peaks in addition to the central peak, resulting in large errors when filling the missing wedge. The mean error in magnitude compared of the SAXS-EM result compared with the analytical smiley face Gaussian distribution is 0.82591 while EM alone without the SAXS fusion gives an error of 2.0562 when compared with the analytical smiley face result. This shows a success of the synergism between cryo-EM and SAXS in 2D case when reconstructing to real space.
Figure 18 depicts the results and comparison of reconstruction with a projection angle gap when the start angle is . The result shows a higher error in SAXS-EM fusion reconstruction in the case when information in is missing as opposed to that when is missing; nevertheless, the SAXS-EM fused result outperforms the EM-only reconstruction. Figure 19 shows the results and comparison of a reconstruction with a projection angle gap when the start angle is . In such a condition, the SAXS-EM fusion does not show any distinctive advantage and both EM alone and SAXS-EM reconstruct a relatively good smiley face. In conclusion, the reconstruction effect is related to the information contained in the projection gap. Only when a certain amount of key information is contained in the gap will the advantage of SAXS-EM fusion be more pronounced.
Two Projection Angle Gaps
Up to this point, we have investigated the case of one gap in the projection angle. In this subsection, two projection angle gaps of and each are considered. Figure 20 shows the result when the gaps are and . The EM experiment helps to reconstruct a smoother smiley face while SAXS-EM fusion provides additional artefacts that result in a higher mean error. But judging from the color, the SAXS-EM fusion result is closer to the ground truth in the outline and the eyes of the face. Since is so small that each gap does not contain much information, we increased the missing wedge and tested and . Figure 21 shows the reconstruction result of two wedges when each angle gap is . By using SAXS-EM fusion, most of the smiley face is reconstructed except for some diagonal streaks in the centre, which are also present in the EM results. Nevertheless, SAXS-EM fusion has a considerably lower error. This illustrates again that the difference between a SAXS-EM fusion reconstruction and an EM-alone reconstruction is greater if the missing wedges contain important information about the object.
Larger Gaps and Fewer Sampling points
In the cases of a larger projection angle gap such as , Fig. 22 gives the results where both SAXS-EM and EM-alone reconstructions fail to reconstruct a smiley face. In Fourier space, we find that the largest error occurs when the points in the gap are far from the centre. However, the reconstructed smiley face using SAXS-EM information fusion still has some advantages due to the display of the face outline and the lower mean error compared with that given by using EM data alone. In other cases, when the projection angle gap gets smaller (i. e. ), the two methods can both reconstruct the 2D smiley face successfully with low error.
Another remark is that the reconstruction effect is also related to grid size in real space. When we sample 33 different radii and 33 discrete points evenly spaced along each circle, SAXS-EM fusion fails to show its superiority in the smiley face reconstruction. This is partly because of an ill-conditioning in the matrix, , leads to numerical errors in the computation of in (29) at certain angle gaps and starting angles. Moreover, since the root selection procedure is not entirely foolproof, it is possible that for some radii a “wrong” root is selected in a coarse grid —where the sampling is not fine enough to ensure that both roots have converged—allowing the possibility of the SAXS reconstruction performing worse than the EM reconstruction.
5.3 3D smiley face
5.3.1 Numerical experiment procedure
A three-dimensional smiley face was constructed using (47) and substituting in the definition of the Gaussian in (45). Here, the of each Gaussian distribution is similar to that in the two-dimensional case in Sec. 5.2. The only difference between the two-dimensional and three-dimensional mean vector is in the component of perpendicular to the plane containing the two-dimensional smiley face explored in Sec. 5.2. In this example, we construct a three-dimensional smiley face such that and is the mean vector corresponding to a given Gaussian on the two-dimensional smiley face in Sec. 5.2. Then, is defined as 10 values divided evenly from 0 to 1.4. In effect, the three-dimensional smiley face is constructed by extruding the two-dimensional smiley face and discretizing it into spherical Gaussians; the result would be a cylindrical object as shown in Fig. 24. The Fourier transform of the three-dimensional smiley face is then calculated by (46) with and summed using (48).
The Fourier transform of the smiley face in spherical coordinate for several values of radius can be seen in Fig. 25.
Since it is possible to deduce the Fourier transform of the 3D smiley face analytically, we can generate multiple great circles and obtain the value of the Fourier transform on sampled points on each great circle using the analytical function. This sampling was done at discrete points and great circles to simulate the incompleteness of the data generated by the EM experiment. Figure 26 shows all the sampled points from all great circles considered in Fourier space. At the moment, the number of sampled points for each sphere with different radii in Fourier space is the same but this need not always be the case, and the case where the number of sampled points is a non-constant function of sphere radius will be discussed later.
Now, we proceed to discuss the results obtained in Fourier space by SAXS-EM fusion and an EM-alone experiment. With that, a real smiley face can be reconstructed by interpolating from discrete data in spherical coordinate to Cartesian coordinate that is prepared for inverse FFT. Based on the data in spherical coordinates, a three-dimensional cubic spline interpolation is used to evaluate the value at each point given in Cartesian coordinates. Like the 2D case (5.2), the whole region of Cartesian coordinate is a cube whose side length equals the diameter of the biggest circle given in spherical coordinates whereas outside the spherical range the value of the Fourier transform is set to zero, consistent with the analytical condition. After that, we take the inverse Fourier transform to real space. The reconstruction obtained by the two methods is showed in Fig. 28 and Fig. 28. In Fig. 28 and Fig. 28, the first row shows three slices of the reconstruction to make a better comparison and the second row is the 3D model in a different view. To show a better comparison, we define “bad points” as the points at which the error between the original and the reconstructed function is greater than 20% of the original value.
We can see that the two reconstructions are all good since there are many sampled points in Fourier space and the simulation covers almost all the information. However, we may lose some information in a real experiment.
5.3.2 Results and discussion
Random Great Circles with Constant Sampling Number
To simulate what happens in the real experiment, we also set some cones in the sphere in Fourier space and exclude all the great circles whose directions are outside these cones. As shown in Fig. 30, the red points represent the direction perpendicular to the plane containing the great circles and the blue cones represent the collection of those directions whose corresponding great circles are sampled in the EM data. Those red points which are not in the blue cones are excluded. The center axes of these blue cones are generated by several different sets of , where
[TABLE]
The angle around each center axes is .
The reconstruction result by two methods are showed in Figure 32 and Figure 32.
The reconstruction by SAXS-EM fusion is more like a smiley face than that obtained by EM alone especially in plane. Though we may not see an obvious difference between two results, the number of bad points can show the advantage of SAXS-EM fusion.
Uniformly Distributed Great Circles with Constant Sampling Number
In the simulation above, since the normal direction of great circles are generated randomly, the normal vectors will tend aggregate in some areas and be sparsely distributed in other regions. Hence, we generate the great circles evenly on the sphere, as shown in Fig. 34, and remove the information obtained by great circles whose normal vectors, which are the red points, are in the blue cones.
With the same process mentioned above, we could get the reconstruction of smiley face in real space. The result obtained by SAXS-EM and EM alone can be seen in Fig. 36 and Fig. 36.
We can see that the outline of the smiley face is missed in plane and and the helix above the smiley face, which is an artefact, is denser when reconstructing using only EM.
Uniformly Distributed Great Circles with Radius Dependent Sampling Number
In the process of reconstruction above, the sampled points for each great circle in Fourier space is constant for each sphere with different radii. Since the perimeter of a great circle is linear with respect to the radius, if we take sample points like that, we may miss much information in sphere with a larger radius and get too much information in sphere with smaller radius, which is not true in the real experiment.
Hence, we modify the number of sample points in each great circle to be linear to the radius of the sphere. We take the same great circles and missing wedges as showed in Fig. 34. In this case, the number of the sampled points in each great circles is linear to the radius, . We set in our first simulation and the sampled points in sphere with different radius are showed in Figure 37.
With same process, we can get the reconstruction of two methods and the results obtained by SAXS-EM and EM alone are in Fig. 39 and Fig. 39.
We can see that the reconstruction obtained by SAXS-EM looks quite similar to the original one while we can hardly say that the reconstruction obtained by EM alone looks like a smiley face. Also, the number of bad points of the reconstruction by using EM alone is much greater than that by using SAXS-EM fusion.
At a higher value of , when , there is a greater improvement in the results. In this case, for the SAXS-EM reconstruction, one can clearly see the smiley face. Using EM alone, we can also get a good reconstruction since we obtain enough information with a higher value of . The results are shown in Fig. 41 and Fig. 41.
A lower value of , where , also been simulated. The result can be seen in Fig. 43 and Fig. 43. In this case, the EM reconstruction looks messy. Moreover, the SAXS-EM reconstruction also does not look like a smiley face. Nevertheless, we can see the outline of the smiley face in SAXS-EM reconstruction and the number of bad points is also much less in the SAXS-EM reconstruction.
5.4 Minion
Before we apply the technique to a real biological macromolecule, we create a minion as the sum of a set of 3D Gaussian distributions by separating different means of the Gaussian function in space similar to the 3D smiley face in Sec. 5.3. The center of each Gaussian function is the position of atoms and we seek to reconstruct the electron density in these atoms. As Fig. 44 shows, the colorbar from yellow to red indicates the magnitude of Gaussian function at certain points which represents the density in real space. We also use the magnitude to set the size of each points in this space for better visual effect. We can clearly see the different body parts of this character. Actually, the whole space is full of points indicating the density but when the density gets very low, the points there are too small to be visible. Hence, most atoms center at the outline of the minion.
Figure 45 shows what the minion is like in Fourier space. We pick spheres with radius 0.5, 1 and 2 to illustrate its Fourier transform magnitude. Similar to what we have experimented in 3D smiley face first, Fig. 46 shows the reconstruction results of the minion without cones to exclude sampling points in Fourier space.
And then cones are set in the sphere to exclude all the great circles whose normal direction are inside these cones in Fourier space. The center axis direction vector is defined in Sec. 5.3.2 and the great circles are generated uniformly in space. We still keep the radius dependent sampling number on each great circle here. The reconstruction result is shown in Fig. 47.
SAXS-EM fusion successfully restores the main body of the minion and especially the two hands which are the two clouds of black points on both sides of the body. Although we can see fuzzy blue eyes and feet by EM reconstruction alone, the main yellow body is separated in the whole space. It means the places where density should be extremely small are given comparatively higher density.
6 Discussion, Conclusion, and Outlook
In this paper, we presented a novel method of fusing information obtained from Small Angle X-ray Scattering (SAXS) and cryo-EM experiments. Despite the fact that SAXS data is of a lower resolution than EM data, it can be shown that information fusion greatly improves an EM reconstruction especially when the EM data is sparse or misses important regions of high information density. Numerical experiments with shapes in two and three-dimensions demonstrated the advantages of implementing a SAXS-EM fusion as opposed to an EM-only reconstruction. On the other hand, there also exists a theoretical lower bound for the minimum intensity that a given sample of EM data can accommodate and the SAXS information fusion would only add new information if the intensity is above this value for a given radius in Fourier space. Finally, we make the following set of observations that we hope would spur further research in the capabilities of SAXS-EM fusion and particularly, with applications to problems involving the elucidation of macromolecule structure from sparse EM data:
Application of the SAXS-EM fusion to complex biological macromolecules. In this paper, we tested the SAXS-EM fusion with a model of a “minion”, with encouraging results. It reveals that the SAXS-EM fusion technique would be useful to obtain important information that EM experiments may otherwise miss by virtue of having the slice planes not pass through important structural details of biological macromolecules. Since the SAXS information is a spherical average, outstanding structural information would be recorded and can be used to enrich the EM data. 2. 2.
The matrices involved in the solution of the quadratic equation tend to become ill-conditioned when the angles sampled are too close to each other. To allow the possibility of finely sampled EM data, the solution method to obtain the roots for the governing quadratic equation may need improvement so as to avoid the need to deal with ill-conditioned matrices. 3. 3.
Allowing the number of sample points on a given great circle to vary linearly with the radius of the sphere leads to large improvements in the SAXS-EM reconstruction, and this technique can be utilized in further implementations of SAXS-EM fusion. 4. 4.
The root selection procedure proposed in the paper was computationally efficient and also accurate. Nevertheless, it may also be possible to formulate more robust selection rules through integral quadratures (specifically, the stable and accurate integration over scattered data on a sphere) and allow for a more accurate identification of spurious roots. In fact, it may be possible to add further terms to “correct” the roots obtained from the quadratic equation to obtain more accurate results.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Acknowledgement
Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM113240.
Appendix A - The -dimensional Gaussian
The definition of the -dimensional Gaussian is reviewed in the following. Shapes in two-dimensions and three-dimensions can be constructed as a superposition of two-dimensional and three-dimensional Gaussians respectively. The -dimensional Gaussian function defined at , and with a mean of and covariance matrix, , is,
[TABLE]
The Fourier transform of the Gaussian function is also a Gaussian function in Fourier space, and can be written as a function of as,
[TABLE]
Hence, it is possible to approximate a general shape as a linear combination of Gaussians through,
[TABLE]
where is the area of the Gaussian function, and used to control the weight of the Gaussian and thereby alter its contribution to the total sum. Since the Fourier transform is a linear operator, the Fourier transform of would be,
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Afsari et al. [2015] Afsari, B., Kim, J. S., and Chirikjian, G. S. (2015). Cross-validation of data in SAXS and cryo-EM. In IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1224–1230.
- 2Bhamre et al. [2016] Bhamre, Tejal, Zhang, Teng, and Singer, Amit (2016). Denoising and covariance estimation of single particle cryo-em images. Journal of structural biology , 195(1):72–81.
- 3Biedenharn and Louck [1981] Biedenharn, L. C., and Louck, J. D. (1981). Angular Momentum in Quantum Physics: Theory and Application . Addison-Wesley, Boston, MA, USA.
- 4Blanchet and Svergun [2013] Blanchet, Clement E, and Svergun, Dmitri I (2013). Small-angle x-ray scattering on biological macromolecules and nanocomposites in solution. Annual review of physical chemistry , 64:37–54.
- 5Chirikjian and Kyatkin [2016] Chirikjian, Gregory S, and Kyatkin, Alexander B (2016). Harmonic Analysis for Engineers and Applied Scientists: Updated and Expanded Edition . Courier Dover Publications.
- 6Crowther et al. [1970] Crowther, Richard Anthony, De Rosier, DJ, and Klug, Aaron (1970). The reconstruction of a three-dimensional structure from projections and its application to electron microscopy. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences , 317(1530):319–340.
- 7Dong et al. [2015 a] Dong, H., Kim, J. S., and Chirikjian, G. S. (2015 a). Computational analysis of SAXS data acquisition. J. Comput. Biol. , 22(9):787–805.
- 8Dong et al. [2015 b] Dong, Hui, Kim, Jin Seob, and Chirikjian, Gregory S (2015 b). Computational analysis of saxs data acquisition. Journal of Computational Biology , 22(9):787–805.
