Low-signal limit of X-ray single particle imaging

Kartik Ayyer; Andrew J. Morgan; Andrew A. Aquila; Hasan DeMirci,; Brenda G. Hogue; Richard A. Kirian; P. Lourdu Xavier; Chun Hong Yoon; Henry; N. Chapman; Anton Barty

arXiv:1905.05008·eess.IV·April 29, 2020

Low-signal limit of X-ray single particle imaging

Kartik Ayyer, Andrew J. Morgan, Andrew A. Aquila, Hasan DeMirci,, Brenda G. Hogue, Richard A. Kirian, P. Lourdu Xavier, Chun Hong Yoon, Henry, N. Chapman, Anton Barty

PDF

TL;DR

This study demonstrates that X-ray single particle imaging can successfully determine virus structures at extremely low photon counts, approaching the limits of current experimental capabilities, with potential improvements from upcoming high-repetition sources.

Contribution

The paper provides experimental validation that orientation and phase retrieval are feasible at very low signal levels, extending the applicability of Bayesian methods to realistic, low-signal conditions.

Findings

01

Successful imaging at photon counts as low as 1/256 of typical levels

02

High-quality electron density reconstructions from low-signal data

03

Potential for future improvements with high-repetition-rate X-ray sources

Abstract

An outstanding question in X-ray single particle imaging experiments has been the feasibility of imaging sub 10-nm-sized biomolecules under realistic experimental conditions where very few photons are expected to be measured in a single snapshot and instrument background may be significant relative to particle scattering. While analyses of simulated data have shown that the determination of an average image should be feasible using Bayesian methods such as the EMC algorithm, this has yet to be demonstrated using experimental data containing realistic non-isotropic instrument background, sample variability and other experimental factors. In this work, we show that the orientation and phase retrieval steps work at photon counts diluted to the signal levels one expects from smaller molecules or with weaker pulses, using data from experimental measurements of 60-nm PR772 viruses. Even when…

Tables1

Table 1. Table 1 : Data statistics as a function of selection fraction. The photons per frame described in the second column refers to photons outside the central speckle. The last three columns give the resolution in nanometers according to the standard cutoff criteria for the respective metric.

Fraction	ph/fr	Frames	CC_1/2
$1$	$34 783.2 &$ $14 772 &$ 9.02	10.19	8.75
$1 / 2$	$17 349.3 &$ $14 772 &$ 9.16	9.16	8.75
$1 / 4$	$8674.5 &$ $14 772 &$ 9.33	9.16	8.75
$1 / 8$	$4337.3 &$ $14 772 &$ 9.50	9.33	8.75
$1 / 16$	$2168.6 &$ $14 772 &$ 9.69	9.33	8.75
$1 / 32$	$1084.3 &$ $14 772 &$ 9.69	9.33	8.75
$1 / 64$	$542.2 &$ $14 772 &$ 11.2	9.50	8.75
$1 / 128$	$271.0 &$ $14 772 &$ 11.2	9.50	8.75
$1 / 256$	$135.5 &$ $14 772 &$ 11.4	10.9	8.75
$1 / 512$	$67.8 &$ $14 772 &$ 11.7	10.9	8.75
$1 / 1024$	$33.9 &$ $14 772 &$ 20.1	11.2	8.75

Equations12

Ψ = {ρ (x), B (q)}

Ψ = {ρ (x), B (q)}

I_{calc} [Ψ] (q) = ∣ F [ρ] (q) ∣^{2} + B^{2} (q)

I_{calc} [Ψ] (q) = ∣ F [ρ] (q) ∣^{2} + B^{2} (q)

P_{M} [Ψ] = {F^{- 1} [\frac{I _{meas} ( q )}{I _{calc} ( q )} F [ρ] (q)], \frac{I _{meas} ( q )}{I _{calc} ( q )} B (q)}

P_{M} [Ψ] = {F^{- 1} [\frac{I _{meas} ( q )}{I _{calc} ( q )} F [ρ] (q)], \frac{I _{meas} ( q )}{I _{calc} ( q )} B (q)}

FSC (q) = Re \frac{∣ q _{i} ∣ = q \sum F _{1} ( q _{i} ) F _{2}^{*} ( q _{i} )}{∣ q _{i} ∣ = q \sum ∣ F _{1} ( q _{i} ) ∣ ^{2} ∣ q _{i} ∣ = q \sum ∣ F _{2} ( q _{i} ) ∣ ^{2}}

FSC (q) = Re \frac{∣ q _{i} ∣ = q \sum F _{1} ( q _{i} ) F _{2}^{*} ( q _{i} )}{∣ q _{i} ∣ = q \sum ∣ F _{1} ( q _{i} ) ∣ ^{2} ∣ q _{i} ∣ = q \sum ∣ F _{2} ( q _{i} ) ∣ ^{2}}

CC_{1/2} (q) = \frac{∣ q _{i} ∣ = q \sum ( I _{1} - I _{1} ) ( I _{2} - I _{2} )}{∣ q _{i} ∣ = q \sum ( I _{1} - I _{1} ) ^{2} ∣ q _{i} ∣ = q \sum ( I _{2} - I _{2} ) ^{2}}

CC_{1/2} (q) = \frac{∣ q _{i} ∣ = q \sum ( I _{1} - I _{1} ) ( I _{2} - I _{2} )}{∣ q _{i} ∣ = q \sum ( I _{1} - I _{1} ) ^{2} ∣ q _{i} ∣ = q \sum ( I _{2} - I _{2} ) ^{2}}

PRTF (q) = \frac{1}{N} n = 1 \sum N e^{i ϕ_{n}}

PRTF (q) = \frac{1}{N} n = 1 \sum N e^{i ϕ_{n}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Low-signal limit of X-ray single particle diffractive imaging

Kartik Ayyer

\authormark1,2,* Andrew J. Morgan

\authormark2,13 Andrew A. Aquila

\authormark3 Hasan DeMirci

\authormark4,5,6 Brenda G. Hogue

\authormark7,8,9 Richard A. Kirian

\authormark10 P. Lourdu Xavier

\authormark1,2,3 Chun Hong Yoon

\authormark3 Henry N. Chapman

\authormark2,11,12 and Anton Barty\authormark2

\authormark1Max Planck Institute for the Structure and Dynamics of Matter, Luruper Chaussee 149, 22761, Hamburg, Germany

\authormark2Center for Free-Electron Laser Science, Deutsches Elektronen Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany

\authormark3Linac Coherent Light Source, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA, 94025, USA

\authormark4Biosciences Division, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA, 94025, USA

\authormark5Stanford PULSE Institute, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA, 94025, USA

\authormark6Department of Molecular Biology and Genetics, Koc University, Rumelifeneri yolu, Sariyer, Istanbul, 34450 Turkey

\authormark7Biodesign Center for Immunotherapy, Vaccines, and Virotherapy, Biodesign Institute at Arizona State University, Tempe 85288, USA

\authormark8Biodesign Center for Applied Structural Discovery, Biodesign Institute at Arizona State University, Tempe 85287, USA

\authormark9Arizona State University, School of Life Sciences (SOLS), Tempe, Arizona 85287, USA

\authormark10Department of Physics, Arizona State University, Tempe, AZ 85287, USA

\authormark11Department of Physics, Universität Hamburg, Luruper Chaussee 149, Hamburg, Germany

\authormark12The Hamburg Center for Ultrafast Imaging, Universität Hamburg, Luruper Chaussee 149, Hamburg, Germany

\authormark13Currently with the ARC Centre of Excellence for Advanced Molecular Imaging, School of Physics, The University of Melbourne, Parkville, VIC 3010, Australia

\authormark*[email protected]

Abstract

An outstanding question in X-ray single particle imaging experiments has been the feasibility of imaging sub 10-nm-sized biomolecules under realistic experimental conditions where very few photons are expected to be measured in a single snapshot and instrument background may be significant relative to particle scattering. While analyses of simulated data have shown that the determination of an average image should be feasible using Bayesian methods such as the EMC algorithm, this has yet to be demonstrated using experimental data containing realistic non-isotropic instrument background, sample variability and other experimental factors. In this work, we show that the orientation and phase retrieval steps work at photon counts diluted to the signal levels one expects from smaller molecules or with weaker pulses, using data from experimental measurements of 60-nm PR772 viruses. Even when the signal is reduced to a fraction as little as 1/256, the virus electron density determined using ab initio phasing is of almost the same quality as the high-signal data. However, we are still limited by the total number of patterns collected, which may soon be mitigated by the advent of high repetition-rate sources like the European XFEL and LCLS-II.

††journal: oe

1 Introduction

The potential of X-ray free electron lasers (XFELs) to image biomolecular structures at room temperature without the need for crystallisation has been one of the goals driving their development. For many years, theoretical studies backed by simulated data have suggested that near-atomic resolution of isolated non-crystalline proteins should be possible with currently available XFEL sources [1, 2]. To date, published results have focused on large or symmetric particles such as viruses in the 60-500nm size range where the higher signal levels from larger particles is ideal for methods development [3, 4, 5, 6]. Results from the single particle imaging initiative at the Linac Coherent Light Source (LCLS) [7] have been in a similar size range [8, 9].

Imaging individual proteins has so far proven more elusive due to the lower signal-to-background from smaller sized particles and a lower than expected rate of single particle diffraction pattern acquisition [6]. While theoretical studies indicate that molecular imaging should be achievable using Bayesian algorithms such as the EMC algorithm [10] for near-perfect data simulated assuming currently available XFEL parameters [2], this has yet to be demonstrated using experimental data containing realistic instrument background, sample variability and other experimental factors.

This paper addresses the question of whether these above-mentioned experimental effects pose a fundamental roadblock to diffraction-pattern alignment and phasing algorithms in the low signal limit. We achieve this using experimental rather than simulated data. The approach taken is to start with experimentally measured data and progressively reduce the photon count to levels similar to those expected from smaller particles such as individual proteins. This process also mimics data that would be recorded from the same size particles using weaker X-ray pulses such as will soon be available with a high repetition rate from the LCLS-II upgrade.

We start from data collected by the SPI initiative from $60\text{\,}\mathrm{nm}$ PR772 viruses [9] to 8.5-nm resolution. Weak data was generated by keeping only a small, random fraction of photons from each experimental snapshot. These reduced data, or ‘diluted’, patterns contain just a smattering of photons which often look like pure noise to the eye. In addition to diffraction from the virus particles, each diffraction pattern contains instrument background caused by a range of experimental sources. Any structure in the instrument background does not depend on particle orientation, thus after orientation determination this background appears as a spherically symmetric function incoherently added to the 3D Fourier intensities of the object. To account for this background, we develop a modified iterative phasing algorithm which isolates and retrieves this background while reconstructing the electron density, and also show that phase retrieval is robust to statistical noise.

The paper is set out as follows. The reconstruction pipeline and the results of its application to the full data set are described in Section 3, and a set of metrics including the Fourier Shell Correlation (FSC) and Phase Retrieval Transfer Function (PRTF) for quantifying reconstruction resolution and fidelity are defined in Section 4. The experimental data sets are then subsampled by randomly selecting a fraction of photons in every frame, followed by orientation and phasing of the sparsified photon counts in Section 5. The quality of the electron densities obtained using the subsampled data sets is evaluated and compared using the metrics of reconstruction quality defined in Section 4.

We find that the reconstruction quality persists for a significant reduction of data quantity: even when the signal is reduced by as much as 1/256, quality metrics show the virus electron density determined using ab initio phasing is of almost the same quality as the high signal data. This suggests that given sufficient number of single particle diffraction patterns from sub-10 nm biomolecules with current XFEL parameters (assuming a proportionate reduction in instrument background), or from 60-nm viruses with a pulse 256 times weaker, one can obtain reliable 3D electron densities with the methods presented here. In order to obtain higher resolution, many more patterns will be required to achieve sufficient statistics. This may soon be within reach with advancements in sample delivery methods as well as with high-repetition-rate XFEL sources such as the European XFEL and LCLS-II.

2 Experiment description

Diffraction snapshots of aerosolized PR772 viruses were collected at the Linac Coherent Light Source (LCLS) as described in [9]. Briefly, diffraction patterns were recorded on a pnCCD detector in the AMO instrument at the LCLS [11] at a photon energy of $1.6\text{\,}\mathrm{keV}$ with the detector placed $586\text{\,}\mathrm{mm}$ downstream from the X-ray-sample interaction point, giving a resolution of $11.8\text{\,}\mathrm{nm}$ at the center-edge of the detector and maximum resolution of $8.4\text{\,}\mathrm{nm}$ in the corner of the detector. This data set is available for download from the Coherent X-ray Imaging database [12] as CXIDB 58.

The data set consists of $14\,772\text{\,}\mathrm{f}$ rames with an average signal level of $395\,876\text{\,}\mathrm{p}$ hotons/frame. For a $60\text{\,}\mathrm{nm}$ virus, the speckles were around 100 pixels wide. The pixels were therefore binned by a factor of 4 in both dimensions after photon conversion to reduce computational costs. Excluding bad pixels and the central speckle, where the detetor was often saturated, there were $34\,783\text{\,}\mathrm{p}$ hotons/frame on average. There were on average 22.2 photons/speckle at the detector corner.

Diffraction patterns were recorded at a repetition rate of 120 Hz, however only a small fraction of the X-ray pulses interacted with an object. These so-called “hits” included not only interactions with PR772 virus particles but also with water droplets, multi-particle clusters, and patterns with detector artifacts. Such spurious patterns need to be excluded from analysis. In [9], Reddy et al describes the classification of the single particle patterns using various machine learning methods, with the data for this study based on the classification by manifold embedding [13] to obtain a data set consisting of $14\,772\text{\,}\mathrm{s}$ ingle virus diffraction patterns.

3 Reconstruction procedure

The PR772 virus electron density was reconstructed in a two-step process, illustrated in Fig. 1 and detailed below. First, the orientations of a set of noisy diffraction patterns of mostly identical objects in random orientations with variable incident fluence were determined to produce a 3D intensity volume using the EMC algorithm [10]. The three dimensional diffraction volume was then phased using a background-aware phase retrieval algorithm to arrive at the real-space electron density using a combination of the Difference Map [14] and Error Reduction [15] algorithms.

3.1 Alignment: Determining the 3D reciprocal space intensity distribution

Orientation determination, alignment and scaling of the diffraction patterns into a 3D diffraction volume was performed using the Dragonfly software [2]. Data was provided to Dragonfly in photon counts since the pnCCD detector used in this experiment could resolve individual $1.6\text{\,}\mathrm{keV}$ photons. A Poisson noise model was therefore used in Dragonfly. Both the orientation as well as a relative scale factor was estimated for each pattern to account for incident fluence fluctuations and variations in impact parameter of the virus relative to the beam. The predicted intensities on the detector for a given orientation were multiplied by this scale factor before calculating the probability distribution over orientations (PDOs). These scale factors were updated every iteration using the current estimate for the PDO for each pattern. In order to avoid convergence issues due to the high signal per pattern, the PDO was raised to the power of the deterministic annealing parameter, $\beta$ . This parameter was increased from 0.001 by a factor of $\sqrt{2}$ every 10 iterations. The detailed procedure used for this reconstruction is described in Appendix A.

3.2 Phasing: Iterative phase retrieval with background estimation

The three dimensional diffraction volume from Dragonfly was phased to arrive at the real space electron density using a background-aware iterative projection phase retrieval algorithm as described in Algorithm 1. The update rule for this algorithm consists of a modulus projection defined to incorporate a spherically symmetric background intensity which is incoherently added to the diffraction signal (“Background aware") in addition to a support constraint on the electron density consisting of a fixed number of voxels rather than a static mask (“Voxel number support").

The iterate $\Psi$ is comprised of both the real space density ${\rho(\mathbf{x})}$ and background ${B(\mathbf{q})}$

[TABLE]

In practice this consists of two 3D volumes, one for the real-space electron density and the other for the square root of the background intensity. The calculated intensity is the sum of the intensity from the particle plus the background,

[TABLE]

where $\mathcal{F}[\rho]$ is the discrete Fourier transform of the electron density $\rho$ . The modulus projection rescales both terms by the ratio to the measured Fourier magnitude,

[TABLE]

where $I_{\text{meas}}(\mathbf{q})$ is the measured intensity.

The support projection imposes two different constraints on the two halves of the iterate, $\rho$ and $B$ . A constant $N$ is chosen at the beginning representing the number of voxels inside the particle for which the density is allowed to be non-zero. In this case we chose $N=2000$ . The modulus-squared electron density values are sorted and the highest $N$ are left unchanged while the rest are set to zero. The background intensities, $B(\mathbf{q})$ , are replaced by the spherically symmetric version i.e. the intensities in each radial bin are replaced by their average. The derivation that both these operations are projections is given in Appendix B. Further details regarding masking and alignment of reconstructions from different random starting models are discussed in Appendix C.

3.3 Reconstruction from the full data set

The results of applying the above two-step reconstruction method to all $14\,772\text{\,}\mathrm{p}$ atterns are shown in Fig. 1. The 3D intensity shows strong icosahedral symmetry even though this constraint was not enforced during the reconstruction. The resolution corresponding to the edge of the spherical volume of intensities is $8.4\text{\,}\mathrm{nm}$ . After iterative phasing, the electron density shown in the bottom row was obtained. The contour plot shows an icosahedron with bulges at each vertex while a slice through the object centre shows the presence of a double-walled shell with a slight reduction in density just inside the outer shell, consistent with other treatments of the data [16, 17].

4 Quantifying reconstruction quality

A set of quantitative metrics are required in order to compare reconstructions and assess overall reconstruction quality, for reconstructions of both the full and diluted data sets. We used two metrics established in the literature, which we define in this section for clarity, and applied them to the reconstruction performed with the full data set described above.

4.1 “Gold-standard” cross correlations

The first of these metrics, inspired by cryo-electron microscopy, involves a slight change in the analysis pipeline itself. The ‘gold-standard‘ Fourier shell correlation from CryoEM [18] calls for the separation of the dataset into two equal halves. Each half is analyzed independently, the final volumes rotationally aligned, and the relative agreement is calculated as a function of resolution using the Fourier Shell Correlation (FSC) metric:

[TABLE]

where $F(\mathbf{q})=\mathcal{F}[\rho](\mathbf{q})$ . In practice, the FSC is calculated in $q$ bins which are shells of a certain thickness.

A similar correlation can also be calculated between the two half-dataset intensities. In order to increase the sensitivity of the correlation, the mean is subtracted in each resolution shell before calculating the cross-correlation i.e. a Pearson correlation coefficient is calculated in each shell independently.

[TABLE]

where $\mathrm{I}_{k}$ is shorthand for $\mathrm{I}_{k}(\mathbf{q}_{i})$ and $\overline{\mathrm{I}_{k}}$ is the mean intensity in the resolution shell $\overline{\mathrm{I}_{k}(q)}$ . The increased sensitivity due to subtracting the mean is most apparent when there is spherically symmetric background in the intensity reconstruction, as is the case here.

4.2 Phase retrieval transfer function (PRTF)

The other metric is the phase retrieval transfer function (PRTF) [19]. This metric measures the reliability of iterative phasing by (in effect) averaging complex values over may instances of the phasing process.

The first step in the calculation of this metric is to reconstruct a large number of independent density volumes from different random starting guesses. At any given reciprocal-space voxel, $\mathbf{q}$ , the argument of the complex Fourier transform of the density (the phase) can be slightly different in each random start. The value of the PRTF at that voxel is the complex sum of the unit complex numbers whose argument is the phase, $\phi$ :

[TABLE]

where there are $N$ independent density volumes. By convention, the azimuthal average of the PRTF is reported as a function of the radial coordinate $\left|\mathbf{q}\right|$ . As described in Sec. 3.2, the different reconstructions must be aligned in real-space before calculating the average. A shift in real space is equivalent to a phase ramp which will significantly lower the PRTF. An uncorrected central inversion will negate the phase, leading to a similar reduction [20].

One weakness of the PRTF is that it can be unjustifiably high if the support volume is chosen to be too small. As an extreme case, if the support consists of only one voxel, the PRTF (after alignment) will be unity everywhere even though the reconstruction is very poor. One should therefore have a slightly larger support mask which includes some voxels with low density. In the reconstructions performed here, the support volume (2000 voxels) is significantly larger than the nominal volume of a regular icosahedron with a size corresponding to the fringe spacing (which would be 1497 voxels).

We calculate the PRTF from 400 independent reconstructions. This number is important because it needs to be large enough for the PRTF to converge and the voxels with irreproducible phases to average down. Consider for example the case where the phases are completely random, in which case the sum is a 2D random walk in the complex plane with a fixed step size which has an average distance from the origin of $\sqrt{N}$ after $N$ steps. Thus, the expected lower bound on the PRTF if $N$ reconstructions are averaged is $1/\sqrt{N}$ , which is 0.05 for the case of 400 the case here. In keeping with convention, the threshold value to determine the reproducible resolution is considered to be $1/e=0.37$ .

4.3 Metrics applied to full data reconstruction

We applied the metrics defined above to the reconstructed intensity and electron density calculated using the procedure described in Sec. 3. For the FSC and CC1/2 calculations, frames were split into and odd and even halves containing the 1st, 3rd, 5th… and 2nd, 4th, 6th… patterns respectively. This procedure of splitting is chosen in order for both halves to be similarly affected by slowly varying drifts in the experiment. It is also sufficiently random because the “hits" themselves are a random subset of all the patterns collected.

The FSC and CC1/2 plots are shown in Fig. 2. The crystallographic definition of $q$ is used with the full-period resolution, $d=1/q$ . Each of the metrics gives a slightly different estimate of the resolution of the reconstruction. from the half-bit FSC criterion standard common in cryo-electron microscopy [21], the resolution is $8.75\text{\,}\mathrm{nm}$ , while using the CC ${}_{1/2}=0.5$ cutoff, the intensities are reproducibly reconstructed to a resolution of $9.02\text{\,}\mathrm{nm}$ . The purely phasing metric, PRTF, suggests that the resolution is $10.9\text{\,}\mathrm{nm}$ for both the even and odd data sets. The oscillations apparent in the PRTF plot, which manifest from fringe intensities in the data, further reveal how resolution determined by the PRTF metric can be dramatically affected by whether or not values in one of the local minima happen to lie above or below the 0.37 threshold value. That the resolution estimates differ is not surprising given that different quantities are being measured, and suggests that one should be cautious when reporting a single resolution number. The difference between values further suggests being very conservative with the precision to which resolution is quoted in publication: the mean resolution estimated above is $9.5\text{\,}\mathrm{nm}$ with a standard deviation of $1.1\text{\,}\mathrm{nm}$ , in which case quoting resolution to three significant figures is certainly not appropriate. One should further be careful comparing resolution between publications to make sure that the same values are being compared.

5 Results

We now turn our attention to the effect of reducing the amount of data on reconstruction quality using the analysis pipeline described in Section 3. Data quantity is reduced in one of two ways. Diffraction patterns can be made weaker to simulate the effect of imaging smaller particles or the effect of a lower intensity X-ray beam. This has two effects: firstly orientation determination is expected to become harder as there is less information in each pattern from which to determine the orientation, and secondly the signal-to-noise ratio of the reconstructed 3D intensities is reduced making phase retrieval more challenging. Alternatively, the number of diffraction patterns can be reduced to simulate the effect or a smaller data set consisting of fewer diffraction patterns of the same signal strength. Computationally reducing the data in this way avoids confounding factors from working with different data sets collected at different times under potentially different experimental conditions.

5.1 Reducing diffraction pattern intensity

To simulate measurement of weaker diffraction patterns we computationally reduced the number of photons in each image to produce diffraction patterns with fewer photons drawn from the same experimental data sets. Reducing the number of photons in each diffraction pattern was done by applying a Bernoulli process to each photon with a certain probability to keep or discard the photon. These selection fractions, $p$ , were reduced from $2^{-1}$ to $2^{-10}$ in steps of powers of two. Due to the Poisson nature of the photon counting statistics, this simulates the effect of a factor $p$ weaker incident pulse. The effect of applying this process to a particular diffraction pattern is shown in Figure 3. The average number of photons per frame after photon dilution is shown in Table 1, from which it can be seen that photon counts per frame decreases from nearly 35,000 photons per frame at full strength to only 33 photons per frame when diluted to 1/1024 strength.

Reconstruction of the 3D intensity from weakened data was performed in the same manner as previously described for all data sets using identical Dragonfly reconstruction parameters for all data sets except for the schedule of the deterministic annealing parameter $\beta$ . A low value of $\beta$ was not necessary when the signal level was low since this parameter acts to solve convergence issues for very high signals by broadening the PDOs. Appendix A contains details of the parameters for each subset. The 3D intensities from Dragonfly were phased with identical parameters in every case to generate electron densities. Each reduced data set was split into two halves and independently reconstructed in order to calculate the “gold-standard" FSC and CC1/2, and this whole process was repeated 10 times to obtain error bars on the metrics.

The results of reducing signal strength are summarized in Fig. 4. In Fig. 4(a) we plot one metric, CC1/2, as a function of $q$ for both the full data set and a selection fraction of $p=2^{-8}=1/256$ . Fig. 4(a) shows that the reconstruction from the reduced data shows a slightly decreased quality metric compared to the full data set.

In order to summarise the results as a function of resolution for many different photon dilution levels, in Figs. 4(b)–4(d) we plot each metric in grayscale versus both selection fraction and $q$ , where color represents the metric value. The green dashed line in Fig. 4(b) marks the somewhat arbitrarily chosen CC ${}_{1/2}=0.5$ cutoff, and shows how the resolution of the intensity reconstruction becomes progressively worse as $p$ is reduced. One cause of this reduction is just the graininess of the reconstruction due to insufficient total signal. Similarly the green line in Fig. 4(c) represents the the typical PRTF $=1/e$ cutoff. The step decrease in resolution shown by the PRTF in Fig. 4(c) occurs when the overall PRTF decreases to the point where the next local minima falls below cutoff threshold, Fig. 2. The resolution estimated by each metric is tabulated in Table 1.

From the metrics alone one immediately notices that the electron densities do not suffer from such a drastic falloff in resolution at very low signal. In effect, the support constraint during phasing restores the smoothness of the speckles even when the total number of photons per 3D speckle (Shannon voxel) is low, partially negating the effect of insufficient total signal. For the highest photon dilution ( $p=1/1024$ ), the average signal level used to determine the orientations is just 33.9 photons/frame.

We also studied the effect of reducing data on the histogram of electron density values retrieved in real space. Figure 5 shows the histogram of electron densities inside the support mask for three different selection fractions. The plots are averaged over the 20 phasing runs for each fraction (10 random subsets and two halves per subset). The histograms clearly show the degradation in quality as signals are reduced, with the average reconstructed particle tending towards a uniform icosahedral blob with no internal structure. Additionally, the presence of the low density voxels is reassurance that the support was not too tight and the calculated PRTF not artificially high. For selection fractions above $1/32$ , the histograms and densities were nearly identical, and are hence not shown for clarity. The difference in electron density histograms suggests that differences in the real space electron density may not be entirely reflected in all of the reconstruction metrics, and that metric cutoff values used to assess resolution may on their own paint a partial picture of reconstruction quality.

5.2 Reducing number of patterns

An alternative method of reducing the total number of measured photons is be to select a random subset of full intensity diffraction patterns. By this method one approaches the limit of a few bright patterns.

From the total number of $14\,772\text{\,}\mathrm{,}$ 10 random subsets were generated with 8192, 4096, 2048, 1024 and 512 patterns respectively. Each of these subsets was split into two halves (the even and odd patterns) and independently reconstructed. The CC1/2 plots for the intensity reconstructions for each of the subsets is shown in Fig. 6. Using this approach the metrics remain largely unaffected provided more than 2048 patterns in total are used (1024 in each half data set), indicating that the reconstruction was very stable and supports the hypothesis that there was more than enough data for this resolution. However, with 1024 frames (512 frames in each half), the reconstruction failed 4 out of the 20 times. What happens in this case is that if the number of patterns is reduced too much, they do not fill the 3D reciprocal space volume, leading to artifacts in orientation determination. Since a unique assignment of orientation for just 512 patterns would be insufficient to fully populate reciprocal space, the reconstruction only succeeds due to the PDOs being broad when $\beta$ is low. Even so, there are times when the 3D intensity collapses into a single, or a few planes: orientation determination effectively fails and all frames are assigned to one or a few orientations. Fortunately, this failure mode is easy to identify and exclude from averaging. The failed reconstructions have been retained in this work for the sake of completeness. Other algorithms which use additional constraints on the intensity, from a restricted real-space support, or from additional point-group symmetries, may have better performance in this limit of a few very bright patterns.

6 Discussion

By sub-sampling the experimental data from PR772 viruses measured in [9], we show that the reconstruction quality is essentially same as from the full data set with as few as 135 relevant photons/pattern, corresponding to 0.087 photons/speckle at the detector corner. This approaches the limits of prior work using simulated data [1, 10, 2] or proof-of-principle experiments under highly controlled conditions not realistic for single particle imaging [22, 23]. By way of contrast, the results here are based on data derived from experimental measurements on PR772 viruses incorporating particle variability and instrument background, demonstrating that the signal required for X-ray single particle imaging under realistic conditions is much lower than previously demonstrated especially in terms of the number of scattered photons required per frame.

From this numerical experiment we conclude that current SPI algorithms should be capable of processing experimental single particle diffraction patterns when the photon flux in the X-ray focus is 256 times smaller than currently available at LCLS for particles of the same size as PR772. Furthermore, algorithms appear to be more robust for the case of many weak hits than a small number of very strong hits. The extension of this method to smaller particles is not so direct. In order for this analysis to also hold for the case where the particle volume is reduced by the same factor, one requires that the parasitic scatter is also proportionately reduced. At higher photon energies, significantly lower background has already been achieved [8] than present in this data set. Thus, one strategy for the future direction of the field may be to move to hard X-ray instruments where one has reduced scattering cross section (factor 20 lower for 7 keV vs 1.6 keV, as was the case here) but possibly much lower background.

From this analysis we also conclude that analysis algorithms on their own are not the current limiting factor for SPI imaging. Low background data collection has already been demonstrated in the data set of [8] to 6Å resolution. Unfortunately there were insufficient hits from the entire beamtime for a reconstruction to be feasible. The work here suggests that signal levels may have been adequate had sufficient single-particle diffraction patterns been collected. This points to the need to further develop methods for introducing single particles into the X-ray focus in sufficient density to make sufficient measurements at high resolution. Indeed, this could currently be one of the main factors limiting further progress in SPI imaging. Another key conclusion is that further work is needed in the area of single particle diffraction pattern classification to achieve similar noise tolerance as orientation determination, for which the efficacy of machine learning techniques in the limit of low signal still needs to be explored. This result bodes well for the prospects of single particle flash X-ray imaging to near-atomic resolution at high repetition rate XFELs like the European XFEL and LCLS-II and may help guide future XFEL and instrument design.

Appendix A: Intensity reconstruction details

This appendix gives the detailed steps applied to reconstruct the intensity volume from the full dataset with $14\,772\text{\,}\mathrm{f}$ rames shown in Sec. 4.3. A similar procedure was used for the reduced data set reconstructions whose results are described in Sec. 5. All intensities were reconstructed using Version 1.0.4 of the Dragonfly software. The virtual powder sum from all the patterns is shown in Fig. 7(a). Figure 7(b) shows the mask used when reconstructing the intensities. The innermost pixels inside the central speckle were not used to determine the orientations because of saturation. Some other regions were completely excluded from either orientation determination or to calculate the average 3D intensity.

First, the photon converted patterns were downloaded as HDF5 files from the CXIDB. Each file contains patterns from a single experimental run. The photons were first converted to the sparse .emc format using the script h5toemc.py. The configuration file used for this reconstruction is shown in Fig. 8. The file specified by in_mask_file is provided along with the Dragonfly source code and is shown in Fig. 7(b). The make_detector.py utility was used to generate the detector file detailing which voxel was sampled by every pixel. The ewald_rad parameter sets the $q$ -space size of a voxel which is defined to be 1/lambda/ewald_rad. amo86615_PR772.txt is a text file containing the names of the converted emc files from every run. 100 iterations of the EMC algorithm were performed starting from a random starting model (uniform random numbers at each voxel).

For all the cases where the data set was split into two halves, the selection option was added in the [emc] section and set to odd_only and even_only for the two halves respectively. Since the intensity reconstruction is invariant to an overall rotation, the two half-data set volumes were rotationally aligned with each other using the compare utility in Dragonfly. This program maximizes the overall CC1/2 between the two models within a radius range and also calculates the value of CC1/2 as a function of $q$ (as shown in Fig. 4(a)).

Appendix B: $P_{M}$ and $P_{S}$ are projections

Equation 3 for $P_{M}$ describes the rescaling of both the background and signal Fourier magnitudes by the square root of the ratio of measured to calculated intensities. The Fourier space modulus constraint requires that the calculated intensity defined in Eq. 2 equals the measured modulus $\sqrt{I_{\text{meas}}}$ . $I_{\text{calc}}$ has three components at each voxel, namely the real and imaginary parts of the Fourier transform of the electron density and the background, which is allowed to vary independently. The constraint set, therefore, represents the surface of a sphere with radius equalling the measured modulus. The projection of a general point, $\{\operatorname{Re}(\mathcal{F}[\rho]),\operatorname{Im}(\mathcal{F}[\rho]),B\}$ to this sphere is just a rescaling of this 3-vector by the ratio of the magnitudes.

The support projection applies different operations to the two halves of the iterate. For the electron density $\rho(\mathbf{x})$ , the “voxel number" constraint states that at most $N$ voxels have non-zero density. The projection to this constraint set under a Euclidean metric is just to let these $N$ voxels be the ones with the highest absolute value. Note, however, that unlike the conventional fixed support constraint, this “voxel number” constraint on $\rho$ is non-convex. For the background volume, $B(\mathbf{q})$ , the constraint requires that the background be spherically symmetric. Stated another way, the voxels within the same radial bin should have the same value. The projection to this set is to replace the background magnitude by its azimuthally averaged value.

Appendix C: Iterative phasing details

This appendix contains some additional implementation details about the phase retrieval procedure described in Section 3. The code used to perform the reconstructions in this work can be found here: https://github.com/andyofmelbourne/3D-Phasing. The configuration file used is described in Fig. 9.

As in the intensity reconstructions, the central speckle intensities were not found to be trustworthy and were masked out up to a radius of 6 voxels from the center. This means that during the modulus projection $P_{M}$ , these voxels were left unmodified. In addition to this central region, a 7-voxel thick shell at the edge of the sphere of reconstructed intensities was also masked out in order to avoid ringing artifacts due to truncating half a speckle.

As mentioned in Section 3.2, the reconstruction from the different random starting guesses need to be aligned with respect to each other before averaging and calculating the PRTF. This is done in three steps, first by translating the volumes such that the center of mass of each of them is at the origin. Second, since the objects are assumed to be complex-valued in general, a global phase is removed by subtracting the mean phase over all voxels. Finally, in order to remove a central inversion uncertainty, one solution (for convenience, the first) is taken as the reference For each of the other solutions, the error with respect to the reference for both the original and the center-inverted version is calculated and the one with lower error is retained.

Funding

US Department of Energy (DOE), Office of Science, Office of Basic Energy Sciences (OBES), under contract DE-AC02-76SF00515; U.S. National Science Foundation (NSF) Science and Technology Center BioXFEL Award 1231306; Australian Research Council Centre of Excellence in Advanced Molecular Imaging (AMI); European Research Council, “Frontiers in Attosecond X-ray Science: Imaging and Spectroscopy (AXSIS)”, ERC-2013-SyG 609920 (2014-2018); The Human Frontier Science Program (RGP0010/2017); Fellowship from the Joachim Herz Stiftung; Cluster of Excellence “The Hamburg Center for Ultrafast Imaging” of the Deutsche Forschungsgemeinschaft (DFG) - EXC 1074 - project ID 194651731; Helmholtz Association through project-oriented funds.

Acknowledgments

We wish to thank the members of the Single Particle Imaging initiative at LCLS who provided valuable feedback regarding this work, such as Ivan Vartanyants, John Spence and Max Rose.

Disclosures

The authors declare no conflict of interest.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Neutze, R. Wouts, D. van der Spoel, E. Weckert, and J. Hajdu, “Potential for biomolecular imaging with femtosecond X-ray pulses,” \Journal Title Nature (2000).
2[2] K. Ayyer, T.-Y. Lan, V. Elser, and N. D. Loh, “Dragonfly: an implementation of the expand–maximize–compress algorithm for single-particle imaging,” \Journal Title Journal of applied crystallography 49 , 1320–1335 (2016).
3[3] N. D. Loh, M. J. Bogan, V. Elser, A. Barty, S. Boutet, S. Bajt, J. Hajdu, T. Ekeberg, F. R. N. C. Maia, J. Schulz, M. M. Seibert, B. Iwan, N. Timneanu, S. Marchesini, I. Schlichting, R. L. Shoeman, L. Lomb, M. Frank, M. Liang, and H. N. Chapman, “Cryptotomography: Reconstructing 3d fourier intensities from randomly oriented single-shot diffraction patterns,” \Journal Title Phys. Rev. Lett. 104 , 225501 (2010).
4[4] S. Kassemeyer, A. Jafarpour, L. Lomb, J. Steinbrener, A. V. Martin, and I. Schlichting, “Optimal mapping of x-ray laser diffraction patterns into three dimensions using routing algorithms,” \Journal Title Physical Review E 88 , 042710 (2013).
5[5] T. Ekeberg, M. Svenda, C. Abergel, F. R. N. C. Maia, V. Seltzer, J.-M. Claverie, M. Hantke, O. Jönsson, C. Nettelblad, G. van der Schot, M. Liang, D. P. De Ponte, A. Barty, M. M. Seibert, B. Iwan, I. Andersson, N. D. Loh, A. V. Martin, H. Chapman, C. Bostedt, J. D. Bozek, K. R. Ferguson, J. Krzywinski, S. W. Epp, D. Rolles, A. Rudenko, R. Hartmann, N. Kimmel, and J. Hajdu, “Three-dimensional reconstruction of the giant mimivirus particle with an x-ray free-electron laser,” \Journal Title P
6[6] A. Aquila and A. Barty, “Single Molecule Imaging Using X-ray Free Electron Lasers,” in X-ray Free Electron Lasers, (Springer, 2018).
7[7] A. Aquila, A. Barty, C. Bostedt, S. Boutet, G. Carini, D. De Ponte, P. Drell, S. Doniach, K. Downing, T. Earnest, H. Elmlund, V. Elser, M. Gühr, J. Hajdu, J. Hastings, S. Hau-Riege, Z. Huang, E. Lattman, F. Maia, S. Marchesini, A. Ourmazd, C. Pellegrini, R. Santra, I. Schlichting, C. Schroer, J. Spence, I. Vartanyants, S. Wakatsuki, W. Weis, and G. Williams, “The linac coherent light source single particle imaging road map,” \Journal Title Structural Dynamics 2 , 041701 (2015).
8[8] A. Munke, J. Andreasson, A. Aquila, S. Awel, K. Ayyer, A. Barty, R. J. Bean, P. Berntsen, J. Bielecki, S. Boutet, M. Bucher, H. N. Chapman, B. J. Daurer, H. De Mirci, V. Elser, P. Fromme, J. Hajdu, M. F. Hantke, A. Higashiura, B. G. Hogue, A. Hosseinizadeh, Y. Kim, R. A. Kirian, H. K. N. Reddy, T.-Y. Lan, D. S. D. Larsson, H. Liu, N. D. Loh, F. R. N. C. Maia, A. P. Mancuso, K. Mühlig, A. Nakagawa, D. Nam, G. Nelson, C. Nettelblad, K. Okamoto, A. Ourmazd, M. Rose, G. van der Schot, P. Schwa

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Low-signal limit of X-ray single particle diffractive imaging

Abstract

1 Introduction

2 Experiment description

3 Reconstruction procedure

3.1 Alignment: Determining the 3D reciprocal space intensity distribution

3.2 Phasing: Iterative phase retrieval with background estimation

3.3 Reconstruction from the full data set

4 Quantifying reconstruction quality

4.1 “Gold-standard” cross correlations

4.2 Phase retrieval transfer function (PRTF)

4.3 Metrics applied to full data reconstruction

5 Results

5.1 Reducing diffraction pattern intensity

5.2 Reducing number of patterns

6 Discussion

Appendix A: Intensity reconstruction details

Appendix B: PMP_{M}PM​ and PSP_{S}PS​ are projections

Appendix C: Iterative phasing details

Funding

Acknowledgments

Disclosures

Appendix B: $P_{M}$ and $P_{S}$ are projections