Predicting quantum many-body dynamics with transferable neural networks
Zewang Zhang, Shuo Yang, Yi-hang Wu, Chenxi Liu, Yimin Han, Ching Hua, Lee, Zheng Sun, Guangjie Li, Xiao Zhang

TL;DR
This paper introduces a recurrent neural network framework that efficiently predicts the dynamics of 1D quantum many-body systems, demonstrating transferability and accuracy without detailed Hamiltonian knowledge.
Contribution
It presents a simple SRU-based transfer learning approach capable of predicting quantum dynamics from a single initial state, reducing computational costs and requiring minimal system information.
Findings
Accurately predicts 1D Ising model dynamics
Demonstrates transferability to larger systems
Achieves predictions with constant computational complexity
Abstract
Machine learning (ML) architectures such as convolutional neural networks (CNNs) have garnered considerable recent attention in the study of quantum many-body systems. However, advanced ML approaches such as transfer learning have seldom been applied to such contexts. Here we demonstrate that a simple recurrent unit (SRU) based efficient and transferable sequence learning framework is capable of learning and accurately predicting the time evolution of one-dimensional (1D) Ising model with simultaneous transverse and parallel magnetic fields, as quantitatively corroborated by relative entropy measurements and magnetization between the predicted and exact state distributions. At a cost of constant computational complexity, a larger many-body state evolution was predicted in an autoregressive way from just one initial state, without any guidance or knowledge of any Hamiltonian. Our work…
| System | ED | Ours | ED | Ours | ED | Ours | ED | Ours |
|---|---|---|---|---|---|---|---|---|
| 2-spin | 0.015 | 1.1 | 2.3 | |||||
| 3-spin | 0.035 | 2.2 | 4.4 | |||||
| 4-spin | 0.059 | 3.8 | 7.6 | |||||
| 5-spin | 0.271 | 17.5 | 34.9 | |||||
| 6-spin | 0.556 | 35.2 | 70.5 | |||||
| 7-spin | 1.15 | 73.1 | 146.3 | |||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Predicting quantum many-body dynamics with transferable neural networks
Ze-Wang Zhang
School of Physics, Sun Yat-sen University, Guangzhou 510275, China. Telephone: 15602298593
Shuo Yang
State Key Laboratory of Low-Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
Yi-Hang Wu
School of Physics, Sun Yat-sen University, Guangzhou 510275, China
Chen-Xi Liu
School of Physics, Sun Yat-sen University, Guangzhou 510275, China
Yi-Min Han
School of Physics, Sun Yat-sen University, Guangzhou 510275, China
Ching-Hua Lee
Department of Physics, National University of Singapore, 117542, Singapore
Institute of High Performance Computing, 138632, Singapore
Zheng Sun
School of Physics, Sun Yat-sen University, Guangzhou 510275, China
Guang-Jie Li
School of Physics, Sun Yat-sen University, Guangzhou 510275, China
Xiao Zhang
School of Physics, Sun Yat-sen University, Guangzhou 510275, China
Abstract
Machine learning (ML) architectures such as convolutional neural networks (CNNs) have garnered considerable recent attention in the study of quantum many-body systems. However, advanced ML approaches such as transfer learning have seldom been applied to such contexts. Here we demonstrate that a simple recurrent unit (SRU) based efficient and transferable sequence learning framework is capable of learning and accurately predicting the time evolution of one-dimensional (1D) Ising model with simultaneous transverse and parallel magnetic fields, as quantitatively corroborated by relative entropy measurements and magnetization between the predicted and exact state distributions. At a cost of constant computational complexity, a larger many-body state evolution was predicted in an autoregressive way from just one initial state, without any guidance or knowledge of any Hamiltonian. Our work paves the way for future applications of advanced ML methods in quantum many-body dynamics only with knowledge from a smaller system.
I Introduction
Machine learning (ML) approaches, particularly neural networks (NNs), have achieved great success in solving real-world industrial and social problems 1, such as image recognition2, high level image synthesis and style transfer3, human-like raw speech generator4, producing original melodious MIDI notes5, neural machine translation 6. Inspired by its widespread applicability, ML was soon adopted by condensed matter physicists in the modeling of quantum many-body behavior and phase transition discovery 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17. Compared to so many advances in computer vision18, speech processing19, and natural language processing 20, it is natural to ask if recent progress in these more sophisticated ML architectures can benefit or even revolutionize the modeling of quantum systems. For instance, can quantum many-body dynamics be “learned” through transferable learning21, 22.
Thus, the main objective of this work is to demonstrate the novel application of NNs in the transferable learning and prediction of the evolution of a many-body wavefunction, an otherwise computationally intensive task that has not been solved by generative models. Focusing on static problems, it is proven that deep NNs like restricted Boltzmann Machine (RBM) can represent most physical states24, and a recent work based on very deep and large CNNs shows the ability to circumvent the need for Markov Chain sampling on two-dimensional interacting spin model of larger systems25. Lately, physical properties of spin Hamiltonians are reproduced by deep Boltzmann Machine (DBM), as an alternative to the standard path integral26. Our approach is fundamentally in contrast with conventional approaches in computing many-body dynamics: instead of evolving the wavefunction explicitly with the Hamiltonian, which becomes prohibitively slow and impractical as the number of spin variables increases, we directly predict the dynamical wavefunction from the initial state by propagating it with an efficient and transferable framework based on unified spin encoding, chain encoding and SRU27 module. With the same level of parallelism as feed-forward CNNs and scalable context-dependent capacity of recurrent connections, our proposed framework are naturally suited for learning many-body systems with unified parameters, although they have never been harnessed for exact quantum state evolution, in our scenario, a 1D Ising model with both parallel and transverse magnetic field.
Inspired by end-to-end training28 and domain adaptation29, 30, we specialize to the many-body dynamics of a 1D Ising chain with transverse and parallel magnetic fields. Comparison with exact conventionally computed results with up to seven spins reveals high predictive accuracy, as quantified by the relative entropy as well as magnetization. Indeed, our SRU-propagated wavefunction shows a strong grasp of the periodicity in the time evolution, despite being unaware of the Hamiltonian that sets the energy (inverse periodicity) scale. Encouraged by circumventing the problem of exponential computational complexity through unified encoding mechanisms and parallel recurrent connections, we hope that such encouraging results from our pioneering transferable learning appoach will inspire further applications of transferable learning methods to build a shared model suited for quantum systems with vast spin variables.
II Dynamics on a 1D Ising chain
We consider a 1D Ising spin chain composed of spin variables with local transverse () and parallel () magnetic fields, described by the Hamiltonian
[TABLE]
where and denotes the Pauli matrices, and denotes the spin variable. When the magnetic field is parallel () or transverse (), the Hamiltonian is exactly solvable. However, when and , the dynamics of spins must be numerically computed in the -dimensional many-body Hilbert space spanned by direct product states of single-spin wavefunctions :
[TABLE]
Wavefunction dynamics can be exactly computed through unitary time evolution of the Hamiltonian
[TABLE]
where is the diagonal eigenenergy matrix.
The -dimensional -body wave function quickly becomes expensive to compute as increases. We propose a ML approach with spin encoding layer, chain encoding layer, SRU layers, and spin decoding layer, which instead attemps to predict its time evolution based on prior knowledge of the time evolution behavior of known training states. This training (learning) only has to be performed once for the relatively inexpensive prediction of any number of initial states. Importantly, the training and prediction process captures solely the intrinsic evolution patterns of the wavefunctions, and does not involve any explicit knowledge about the Hamiltonian. From the ML perspective, this dynamical state evolution problem can be regarded as a straightforward sequence generation problem 31. Moreover, as we shall explain, our SRU-based framework is transferable.
III The transferable NN approach
We next outline the broad principles behind our NN approach of predicting quantum state evolution, with details in 32. Here we choose a NN composed of a spin encoding layer, a chain encoding layer, SRU layers and a spin decoding layer (Fig. 1b). The vanilla SRU NN with peephole connections (Fig.1a) substitutes inherent matrix multiplication with parallelizable element-wise multiplication operations ( in Fig. 1a) associated with , hence the calculation of doesn’t have to wait until the whole is updated. With the help of spin encoding and decoding layers, the amount of trained parameters is fixed, and thus the complexity has an upper bound instead of increase exponentially.
Our procedure occurs in two main stages: the training stage and the inference stage. In the training stage, we first “train” or optimize the weight parameters of our SRU-based framework by feeding it with a large number of training sequences, which are the time-evolved wavefunction data of randomly chosen initial -spin to -spin state sequences sampled over 500 timesteps, obtained via conventional exact diagonalization (ED). The SRU-based framework is fully optimized by Adam optimization algorithm 33 to minimize the mean squared error between the ED-evolved and SRU-evolved states at all timesteps in a mini-batch 32.
Following the training stage is the inference stage, when the SRU-based framework is ready for predicting the evolution of arbitrarily given initial states. As sketched in Fig. 1d, the initial many-body state enters the leftmost block at , then processed by a spin encoding layer, a chain encoding layer, and two fully-connected layers, and its output is propagated as input state to the next block with hidden layers . The output of each block denotes a new quantum state at a certain timestep. The combination of memory cell and hidden output serves to implement effective context-dependent behaviors. As illustrated in Fig. 1(a) and further elaborated in 32, context-dependent information kept in memory cell is modified by its previous value , new input interacted with forget gate and skip gate at that timestep, as well as “hidden” information on from the previous SRU cell. Based on the already optimized SRU-based framework, the final predicted quantum state as a function of time would be generated from one fully-connected layer and the spin decoding layer as shown in Fig. 1b.
IV Comparison between exact and SRU-based evolutions
We report very encouraging agreements between wavefunctions evolved by as computed by ED, and wavefunction evolutions as predicted by our SRU-based framework. As for the 1D Ising model, we set the local transverse magnetic field to be , parallel magnetic field to be and , the time interval to be , and keep this setting constant for all computation. We find that the maximum energy eigenvalue is about , proving that the time interval we choose is small enough. The number of spin variables studied ( to ) decides the cost of exactly computing the different time evolutions over 0.2 second (100 timesteps) prior to training the network, since the time complexity of ED method is . The training and inference loss of different systems is shown in Fig. 2.
As a concrete demonstration, we visually illustrate the comparison for the evolution of a typical state from 2-spin to 7-spin in Fig. 3. These states are evolved from arbitrarily chosen initial states from the test set. Saliently, the evolution predicted by the SRU-based model accurately reproduces that from exact computations at the beginning timesteps. To confirm that this agreement is not just due to a fortuitous choice of component, we look at the evolution across all components of the same states in Fig. 4.
To further quantify the agreement of SRU and ED wavefunction evolutions, we compute the relative entropy (Kullback–Leibler divergence)34 of their distributions over 1000 test wavefunctions sequences. For discrete probability distributions and , the relative entropy is defined as
[TABLE]
Given ED-computed wavefunction coefficient vectors and SRU-predicted coefficient vectors , the and variables take values
[TABLE]
at time and basis vector , where labels the test sequence. Hence the mean relative entropy (MRE) at each timestep is
[TABLE]
and measures the amount of information lost when the distribution from SRU predictions is used to represent the distribution from ED results. The smaller the value of , the more accurate is their agreement.
In Fig. 5, we show how the MRE varies with time during the generation of test sequences. We find that in all six systems, the order of relative entropy is always within . Evidently, with the increase of timesteps, the relative entropy generally shows an upward trend and increases linearly with timesteps (see 32), which is caused by the accumulation of errors in the process of conditional generation without any external guidance, though already suppressed dropout layers. To quantify our model’s performance by a physical variable, we draw the magnetization intensity calculated from both predicted (SRU) and simulated (ED) wavefunctions in Fig. 6, which have a nice agreement. Specifically, for smaller-sized systems, such as 2-spin and 3-spin, the predicted magnetization intensity has a very nice agreement with simulated one. With the increase of spin variables, our SRU-based framework has a performance drop due to the exponentially increased computation complexity. Meanwhile, with the increase of timesteps, the difference between predicted and simulated magnetization intensity also becomes larger, which is due to the error accumulation during the autoregressive generation without any external guidance.
Owing to its unified encoding and parallelism, our SRU-based NN is becoming increasingly more superior over the ED method in terms of efficiency, as the number of spins and batch size increase. Table 1 summarizes the results. When the number of spins gets larger, e.g. and , the advantage of our SRU-based framework on inference speed becomes more and more obvious, that, is attributed to its constant computational complexity. In addition, when we enlarge the batch size to for spins, our model demonstrates a speed times faster than the ED-based method.
After obtaining base model trained with datasets of to spins by epoches, we may continue to finetune it with the dataset of the -spin system. To make a comparison, we also train it from scratch. The results in Fig. 7(a) shows that the validation loss by finetuning base model is much lower than training from scratch, demonstrating that our NN has already learned transferable features from smaller systems. The MRE of 8-spin system is shown in Fig. 7(b).
V Conclusion
In this work, we have successfully applied a transferable NN approach based on SRU networks to approximate the state evolution of dynamic quantum many-body systems with high accuracy and superior scalability. Our work encourages future applications of advanced ML methods in quantum many-body dynamics in a Hamiltonian-agnostic manner. One possibility is to predict the behavior of large and inhomogeneous systems lack of traing data by just learning from a smaller-sized system[33]. Applications of these advancements in ML to quantum many-body problems are left to future work.
Acknowledgements
Xiao Zhang thanks Yingfei Gu, Meng Cheng, Yi Zhang for discussions. Xiao Zhang is supported by the National Natural Science Foundation of China (Grant No. 11874431), the National Key R & D Program of China (Grant No. 2018YFA0306800) and the Guangdong Science and Technology Innovation Youth Talent Program (Grant No. 2016TQ03X688). Shuo Yang is supported by NSFC (Grant No. 11804181), the National Key R & D Program of China (Grant No. 2018YFA0306504) and the Research Fund Program of the State Key Laboratory of Low-Dimensional Quantum Physics (Grant No. ZZ201803).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bengio et al. [2013] BENGIO Y, COURVILLE A, VINCENT P. Representation learning: A review and new perspectives[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8): 1798–1828.
- 2Krizhevsky et al. [2012] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. [S.l.: s.n.], 2012: 1097–1105.
- 3Gatys et al. [2016] GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. [S.l.: s.n.], 2016: 2414–2423.
- 4Van Den Oord et al. [2016] VAN DEN OORD A, DIELEMAN S, ZEN H, et al. Wavenet: A generative model for raw audio[J]. Co RR abs/1609.03499, 2016.
- 5Sun et al. [2018] SUN Z, LIU J, ZHANG Z, et al. Composing music with grammar argumented neural networks and note-level encoding[C]//2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). [S.l.]: IEEE, 2018: 1864–1867.
- 6Wu et al. [2016] WU Y, SCHUSTER M, CHEN Z, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation[J]. ar Xiv preprint ar Xiv:1609.08144, 2016.
- 7Van Nieuwenburg et al. [2017] VAN NIEUWENBURG E P, LIU Y H, HUBER S D. Learning phase transitions by confusion[J]. Nature Physics, 2017, 13(5): 435.
- 8Cai et al. [2018] CAI Z, LIU J. Approximating quantum many-body wave functions using artificial neural networks[J]. Physical Review B, 2018, 97(3): 035116.
