Purifying electron spectra from noisy pulses with machine learning using synthetic Hamilton matrices
Sajal Kumar Giri, Ulf Saalmann, Jan M. Rost

TL;DR
This paper introduces a deep neural network trained on synthetic data to effectively purify noisy photo-electron spectra from free-electron laser pulses, enabling accurate analysis of atomic and molecular ionization processes.
Contribution
The study presents a novel machine learning approach using synthetic Hamilton matrices to purify electron spectra without system-specific training.
Findings
Neural network successfully purifies noisy spectra
Method generalizes to atomic and molecular systems
Efficient Schrödinger equation propagation enables large training datasets
Abstract
Photo-electron spectra obtained with intense pulses generated by free-electron lasers through self-amplified spontaneous emission are intrinsically noisy and vary from shot to shot. We extract the purified spectrum, corresponding to a Fourier-limited pulse, with the help of a deep neural network. It is trained on a huge number of spectra, which was made possible by an extremely efficient propagation of the Schr\"odinger equation with synthetic Hamilton matrices and random realizations of fluctuating pulses. We show that the trained network is sufficiently generic such that it can purify atomic or molecular spectra, dominated by resonant two- or three-photon ionization, non-linear processes which are particularly sensitive to pulse fluctuations. This is possible without training on those systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Purifying electron spectra from noisy pulses with machine learning
using synthetic Hamilton matrices
Sajal Kumar Giri
Ulf Saalmann
Jan M. Rost
Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Str. 38, 01187 Dresden, Germany
Abstract
Photo-electron spectra obtained with intense pulses generated by free-electron lasers through self-amplified spontaneous emission are intrinsically noisy and vary from shot to shot. We extract the purified spectrum, corresponding to a Fourier-limited pulse, with the help of a deep neural network. It is trained on a huge number of spectra, which was made possible by an extremely efficient propagation of the Schrödinger equation with synthetic Hamilton matrices and random realizations of fluctuating pulses. We show that the trained network is sufficiently generic such that it can purify atomic or molecular spectra, dominated by resonant two- or three-photon ionization, non-linear processes which are particularly sensitive to pulse fluctuations. This is possible without training on those systems.
pacs:
32.80.Rm, 41.60.Cr
Recent years have seen an avalanche-like increase of machine-learning applications in physics dubr18 ; mebu+19 ; caci+19 , which roughly fall into three categories: (a) applications within theory, e.g., for quantum information dubr18 or to elucidate intricate many-body properties catr17 , (b) within experiment to optimize experimental conditions, e.g., to characterize a free-electron laser (FEL) pulse sami+17 , and (c) applications that condition learning algorithms theoretically with the goal to apply the trained model to experimental data. Our work falls in category (c). Although in principle far more general, we choose to be specific and apply the approach we develop to the purification of noisy photo-electron spectra as routinely obtained with self-amplified spontaneous emission (SASE) FELs operating in the desired frequency range.
Our goal is to train a deep neural network with sufficiently many noisy spectra and their pure counterparts, such that the trained network will be able to purify a noisy spectrum which is not contained in the training data, in particular an experimental one. With purification, we mean that upon feeding with a noisy photo-electron spectrum the network returns a reference spectrum that would be obtained if the target system would be driven by an ideal Gaussian laser pulse, which we call the reference pulse, cf. Fig. 1. This may seem straightforward. Yet, it is anything but trivial to generate a sufficient amount of suitable training data with an acceptable effort. This is in general the bottleneck for machine-learning applications in theory which requires new ways of thinking. In this vein, we introduce synthetic Hamilton matrices (SHMs). Synthetic means that we vary the matrix elements (here in a random fashion) about base values such that later on the trained network is able to purify real spectra from either experiment or theory. The SHMs are constructed to speed up the generation of training data and we also expect them to become useful for other dynamical problems for which neural networks must be trained. Since the SHMs cover a large range of possible systems we can afford to use for the base itself explicitly calculated photo-ionization dynamics in one dimension which is fast to compute and provides a suitable anchor point for the SHMs.
Setting up networks with SHMs. To put our approach to a credible test we need (i) a physical process, which is sensitive to the pulse profile, (ii) a realistic way to model fluctuating pulses and we need to prepare a large set of spectra suitable for training the network. This involves (iii) a scheme to efficiently propagate millions of time-dependent Schrödinger equations, (iv) a broad and uniform sampling of the generated spectra and (v) a trainable parametrization.
(i) As a physical process which is non-linear in the driving light and therefore very sensitive to the intensity of the light pulse and hence its profile in time, we have chosen quasi-resonant few-photon ionization. It can lead to multi-peak structures in the photo-electron spectrum auto55 ; role86 ; meen94 ; dece12a ; basa+17 .
(ii) Fluctuating pulses from SASE FELs can be modeled by the so-called partial-coherence method pfji+10 ; mopf+11 , an experimentally verified method, which allows one to create ensembles of pulses which differ through fluctuations while the ensemble average converges to a well defined pulse shape suppl . Those pulses have a characteristic duration and a coherence time , we use fs and fs here. Apart from the intrinsic noise the pulses additionally jitter in their pulse energy. We normalize all pulses to unit pulse energy. This is also possible in the experiment as pulse energies can be easily measured shot-to-shot with gas monitor detectors tife+08 .
(iii) In principle numerical codes are available to propagate the time-dependent Schrödinger equation (TDSE) for one active electron in a strong laser field and calculate the resulting photo-electron spectrum tasc12sc12 ; moba16 ; pamu16 . However, the creation of a training data set from millions of pulses is prohibitively expensive, yet essential for successful deep-learning.
To overcome this obstacle we work with Hamilton matrices whose construction is detailed in the supplement suppl . The new element, particularly formulated for the present context is the generation of Hamilton matrices with random energies , coupling matrix elements , and field strengths , corresponding to intensities (referring to the Fourier-limited pulse) in the range of W/cm2. Furthermore, for each Hamilton matrix the coupling to the light is augmented by noise realizations with a central frequency of 21 eV to arrive at
[TABLE]
whereby and . Boldface symbols in Eqs. (1) describe matrices in terms of field-free states. It is only through these synthetic Hamilton matrices that we are able to create a sufficient number of non-trivial training data. The matrices have been derived varying a 1D Hamilton operator (our base system), but since the energies and the coupling matrix elements are chosen randomly, these SHMs can purify real (3D) spectra, as we will see subsequently.
(iv) We have to create a set of spectra for training, validation and testing, which should cover to a large extent the domain of realizable spectra. This step is crucial and most expensive numerically, particularly when compared to the (modest) resources needed to set up and train the network.
To cover the domain of realizable spectra uniformly, we calculate first reference spectra spec . Among those we select the spectra with the largest mutual difference
[TABLE]
For each member of this subset of reference spectra, we calculate fluctuating spectra from noisy pulses generated with the partial-coherence method pfji+10 ; suppl with a different noise realization for each (synthetic) Hamilton matrix. Hence, we must propagate about TDSEs, which takes, however, only a few seconds for a single TDSE thanks to our highly-optimized propagation scheme suppl . It includes pre-diagonalization of the Hamilton matrices which saves computing time since one and the same system is propagated for different pulse realizations. Finally, we have for each Hamilton matrix (1) one reference spectrum and noisy spectra , i. e., a total of spectra.
Instead of the individual we use averaged spectra for efficient training. To this end we draw a random subset containing spectra from the fluctuating spectra for each SHM and repeat this procedure times to create averaged spectra. For our application and is a good compromise between rugged spectra for smaller and an increased training effort for larger . All spectra are normalized, i. e., .
(v) To complete the final step, the parametrization of the spectra for training, we represent the resulting averaged spectra in terms of harmonic oscillator eigenfunctions as
[TABLE]
with the set of coefficients. A basis size of was necessary for the averaged fluctuating spectra, while using a similar expression for the noise-free spectra was sufficient suppl . The network consists of mapping the coefficients . The training aims at minimizing the difference between the predicted for the noise-free spectrum and the expected reference spectra .
The connection between Hamilton matrices, pulses and electron spectra just outlined is summarized schematically in Fig. 2.
Building and training the network. With reference spectra and averaged noisy “copies” of each reference spectrum, we have pairs available for building the network model. Each pair consists of an averaged noisy spectrum with its respective reference spectrum. Note that the network operates exclusively on the electron spectra, cf. Fig. 2. Once trained, it is therefore directly applicable to the experiment which has only access to spectra.
The full data set with pairs of spectra is split in the ratio 8:1:1 between training (), validation () and test () data. Implemented with the deep-learning library Keras ch15 , a fully connected feed-forward neural network is used suppl . The training success and resulting performance of the network as a function of the size of the training data is quantified with the cost function , using the basis representation (3) of the spectra, and a more intuitive error
[TABLE]
for training (), validation () and test () data set, respectively. The error with an upper limit measures the difference , see Eq. (2), between a spectrum and its reference spectrum . The maximal error occurs if the two normalized (i. e. unit-area) spectra are completely disjunct. Both errors (4) decay logarithmically as a function of the SHM data size suppl .
Purification of spectra from SHMs. We are finally in a position to purify noisy spectra and do this first with the SHM-generated spectra the network was not trained on. Typical snapshots of these spectra are shown in Fig. 3d. To get a realistic picture we have selected spectra, cf. Fig. 3 (a–c), which belong to three groups purified with different residual errors in increasing order: Only 1% of the spectra have a purification error better than the one shown in Fig. 3a, the prediction in Fig. 3b has a median error such that half of the spectra have a smaller and half of them have a larger prediction error. Finally, only 1% of the purified spectra have a larger error than the one shown Fig. 3c. The gray-shaded curves provide the reference spectrum in each case. The simple average (from the test-data set for a specific SHM and field intensity) set is shown as a dashed line.
One sees that the purification works quite well, even for a typical “worst case” as in Fig. 3c, where all peaks including the fine structure, appear at the correct energies, despite the fact that none of the features is contained in the averaged spectra. We also note that spectra of rather different shapes and details of the structure, from a smooth single peak (Fig. 3a) over a triple peak (Fig. 3b) to a fine-structured multi-peak shape (Fig. 3c), can be purified successfully. The rather diverse spectra from single fluctuating pulses , as shown in Fig. 3d, indicate the strong sensitivity to the pulse profile which is due to Stark shifts and Autler-Townes splittings. The complete failure of the averaged spectra in revealing the reference spectrum is striking. This happens despite the fact that the reference pulse is retrieved by averaging a sufficient number of fluctuating pulses if created by the partial-coherence method pfji+10 . The corresponding reference spectrum, however, is never obtained by averaging the fluctuating spectra since the underlying ionization dynamics is non-linear. The consequence is an intricate mapping between fluctuating spectra and the reference spectrum, which is constructed with the deep neural network.
Purification of spectra from physical systems. So far the successful purification referred to spectra not known to the network, but generated through SHMs which were also used to train the network, only with different parameters. In the following we will apply the network to photo-electron spectra for three cases of two different physical systems: () He atoms dominated by 2-photon absorption sagi+18 and the hydrogen molecule ion H ionized by () 2- and () 3-photon processes. These spectra have been obtained in full 3D, for technical details see suppl . In case (Fig. 4a–c) the spectra consist of contributions from the s- and d-manifolds, which can be reached by a 2-photon processes, whereby the d-channel clearly dominates (cf. Fig. 4d). For H, aligned along the laser polarization, either the gerade continuum for case (Fig. 4h–j), or the ungerade continuum for case (Fig. 4e–g), is considered. The central frequencies for the laser pulse are chosen according to the transition energies \omega_{\mbox{{\scriptsize\boldsymbol{\alpha}}}}\,{=}\,E_{\rm 2p}{-}E_{\rm 1s}\,{=}\,20.95\,\mbox{eV}, \omega_{\mbox{{\scriptsize\boldsymbol{\beta}}}}\,{=}\,E_{\rm 2\sigma_{\!\rm u}}{-}E_{\rm 1\sigma_{\!\rm g}}{=}\,23.05\,\mbox{eV} and \omega_{\mbox{{\scriptsize\boldsymbol{\gamma}}}}\,{=}\,E_{\rm 1\sigma_{\!\rm u}}{-}E_{\rm 1\sigma_{\!\rm g}}{=}\,11.83\,\mbox{eV}, respectively. Fluctuating pulses are created as before but we use new random realizations. As in the training procedure, we have created 10 averaged spectra, which are fed into the trained network. Each one is composed of 200 fluctuating spectra spec . The 10 resulting purified spectra from the network are again averaged to arrive at the network’s estimate of the reference spectrum.
We show in Fig. 4 results for three different intensities in the range where few-photon ionization is non-perturbative. As expected from SHM-generated spectra in Fig. 3, the averaged spectra (green-dashed lines) do not provide sensible information about the reference spectra. The mapping with the network (blue-solid lines), however, which reveals the respective peak structure of the photo-electron spectra.
Note that the network was neither trained on the 3D helium atom nor on the hydrogen molecule ion, whose spectra are purified successfully with the network mapping in Fig. 4. The training of the network was performed with synthetic data derived from a representative 1D photo-ionization dynamics only, which allowed us to keep the size of the Hamilton matrices small enough to be able to compute the TDSEs for a sufficient amount of training data. Apparently, although generated from the 1D derived ones, the SHMs represent dynamical systems sufficiently generic such that also realistic 3D spectra from the three rather different processes , , and could be purified with the same network. Hence, it should also work on experimental spectra, which will be slightly different to the extent to which many-electron effects show up in photo-electron spectra as compared to the present 3D single-active-electron calculations. To measure reference spectra in a proof-of-principle experiment one could either use seeded FEL pulses ambe+12 ; alap+12 ; alca+13 ; riab+19 or set up an experiment at a coherent (high-harmonic) source and generate noisy pulses artificially.
To summarize, we have devised a strategy to purify noisy photo-electron spectra, typical for SASE FELs with the help of a deep neural network. While this example was chosen on purpose to be specific, through its design our approach is far more general. Firstly, we have checked suppl that other noise models rosa07 ; nila12 can be used. Secondly, purification could be conditioned on any arbitrary reference pulse. Thirdly, and most importantly, the systematic introduction of synthetic Hamilton matrices permits to generate a training data set of ample size with reasonable computational effort and renders the trained network applicable for scenarios where it was not trained for. In the present example, we applied the network trained on synthetic dynamics to purify realistic 3D spectra. For future work, we would like to point out that noisy pulses driving non-linear processes are actually advantageous, since they allow one to obtain the target response over a wide spectral and dynamic range in a single shot, provided one has tools to analyze the resulting spectra.
Acknowledgements. This work has been supported by the Deutsche Forschungsgemeinschaft (DFG) through the priority program 1840 “Quantum Dynamics in Tailored Intense Fields”.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) V. Dunjko and H. J. Briegel, Machine learning & artificial intelligence in the quantum domain: A review of recent progress . Rep. Prog. Phys. 81 , 074001 (2018).
- 2(2) P. Mehta, M. Bukov, C.-H. Wang, A. G. R. Day, C. Richardson, C. K. Fisher, and D. J. Schwab, A high-bias, low-variance introduction to machine learning for physicists . Phys. Rep. 810 , 1 (2019).
- 3(3) G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Machine learning and the physical sciences . ar Xiv:1903.10563 (2019), ar Xiv: 1903.10563.
- 4(4) G. Carleo and M. Troyer, Solving the quantum many-body problem with artificial neural networks . Science 355 , 602 (2017).
- 5(5) A. Sanchez-Gonzalez et al., Accurate prediction of X-ray pulse properties from a free-electron laser using machine learning . Nat. Commun. 8 , ncomms 15461 (2017).
- 6(6) D. Rogus and M. Lewenstein, Resonant ionisation by smooth laser pulses . J. Phys. B 19 , 3051 (1986).
- 7(7) S. H. Autler and C. H. Townes, Stark effect in rapidly varying fields . Phys. Rev. 100 , 703 (1955).
- 8(8) C. Meier and V. Engel, Interference structure in the photoelectron spectra obtained from multiphoton ionization of Na 2 with a strong femtosecond laser pulse . Phys. Rev. Lett. 73 , 3207 (1994).
