Emergent Quantum Mechanics in an Introspective Machine Learning Architecture
Ce Wang, Hui Zhai, Yi-Zhuang You

TL;DR
This paper presents an introspective neural network architecture that autonomously learns quantum concepts like the wave function and Schrödinger equation from simulated data, demonstrating potential for discovering new physics.
Contribution
It introduces a novel machine learning framework that automatically derives quantum laws from data, bridging neural networks and fundamental physics discovery.
Findings
Successfully learned the quantum wave function from data
Discovered the Schrödinger equation through the architecture
Demonstrated potential for autonomous physics discovery
Abstract
Can physical concepts and laws emerge in a neural network as it learns to predict the observation data of physical systems? As a benchmark and a proof-of-principle study of this possibility, here we show an introspective learning architecture that can automatically develop the concept of the quantum wave function and discover the Schr\"odinger equation from simulated experimental data of the potential-to-density mappings of a quantum particle. This introspective learning architecture contains a machine translator to perform the potential to density mapping, and a knowledge distiller auto-encoder to extract the essential information and its update law from the hidden states of the translator, which turns out to be the quantum wave function and the Schr\"odinger equation. We envision that our introspective learning architecture can enable machine learning to discover new physics in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Emergent Quantum Mechanics in an Introspective Machine Learning Architecture
Ce Wang
Hui Zhai
Institute for Advanced Study, Tsinghua University, Beijing 100084, China
Yi-Zhuang You
Department of Physics, University of California, San Diego, CA 92093, USA
Abstract
Can physical concepts and laws emerge in a neural network as it learns to predict the observation data of physical systems? As a benchmark and a proof-of-principle study of this possibility, here we show an introspective learning architecture that can automatically develop the concept of the quantum wave function and discover the Schrödinger equation from simulated experimental data of the potential-to-density mappings of a quantum particle. This introspective learning architecture contains a machine translator to perform the potential to density mapping, and a knowledge distiller auto-encoder to extract the essential information and its update law from the hidden states of the translator, which turns out to be the quantum wave function and the Schrödinger equation. We envision that our introspective learning architecture can enable machine learning to discover new physics in the future.
The ongoing third wave of artificial intelligence has made great achievements in employing neural-network-based machine learning for industry and social applications. Inspired by this great success, machine learning algorithms have also been rapidly applied to various directions of physics research, ranging from high-energy and string theory to condensed matter, atomic, molecular and optical physics.Carifio et al. (2017); Koch-Janusz and Ringel (2018); You et al. (2018); Hashimoto et al. (2018); Torlai and Melko (2016); Wang (2016); Carrasquilla and Melko (2017); van Nieuwenburg et al. (2017); Zhang and Kim (2017); Wang and Zhai (2017, 2018); Zhang et al. (2018) While there has been many successful examples of machine assisted physics research, it remains an ambitious goal to explore the potential of machine learning in unsupervised discovery of concepts and laws of physics from observation data.Iten et al. (2018); Wu and Tegmark (2018) A major challenge is to understand how the machine “thinks”, or what approaches have been developed inside its mind. This typically requires us to open up the black box of the neural network and to identify the most relevant emergent features in the neural activity. Can the analysis of the neural activity also be automated by the machine itself? Can knowledge emerges as the machine examines its own information flow introspectively? To demonstrate these possibilities, here we report an introspective learning architecture, as illustrated in Fig. 1, that allows the machine to distill the knowledge about quantum mechanics from the observation of the density distributions of a quantum particle in different shapes of potentials.
As a proof-of-concept study, we consider a single quantum particle moving in a one-dimensional space with certain potential. Suppose we can measure the particle density for each given potential, we supply the machine with the potential profile as the input and the density profile as the target, and challenge the machine to discover the underlying rule governing the potential-to-density mapping. We discretize the potential and density profiles along the one-dimensional space and treat them as sequences of real numbers: and , where are the discrete coordinates for , which are evenly distributed along the one-dimensional space with a fixed separation . We assume that the potential is always measured with respect to the energy of the particle, such that the particle energy is effectly fixed at zero. We will only consider the case of , such that the particle remains in extended states.
By treating both the potential and density profiles as sequential data, the potential-to-density problem belongs to a broader class of sequence-to-sequence mapping,Kalchbrenner and Blunsom (2013); Sutskever et al. (2014); Cho et al. (2014); Bahdanau et al. (2014) which can be handled by the recurrent neural network (RNN).Goodfellow et al. (2016) The RNN has been widely used in natural language processing to translate sequences of words from the source language to the target language.Neubig (2017) We apply the RNN architecture to perform the potential-to-density mapping as a translation task. In each step, the RNN takes an input from the source sequence, modifies its internal hidden state accordingly, and generates the output based on the hidden state, as illustrated in Fig. 2(a). We adopt the following update equations
[TABLE]
where both the input and the output are scalars and the hidden state is a -dimensional vector. The hidden state is updated by an input-dependent linear transformation, represented by a matrix multiplied to the vector . The output is generated from the hidden state by a projection map . The data flow is graphically represented in Fig. 2(b). The output sequence is then compared with the target sequence over a window of steps to evaluate the loss function
[TABLE]
How the RNN updates its hidden state and generates output is determined by the functions and . In general, and could be non-linear functions modeled by feedforward neural networks for instance. However, for our problem, we find it sufficient to model by a Taylor expansion (to the th order in ) and by a linear projection,
[TABLE]
where is the th order Taylor expansion coefficient matrix (each of the dimension ) and is a -dimensional vector. The elements in and are model parameters to be trained to minimize the loss function . The training dataset contains pairs of potential and density sequences that serve as parallel corpora to train the RNN translator. They are currently obtained from numerical simulation 111see Supplementary Material for details about data acquisition., but can be collected from experiments in future applications, from instance, the quantum gas microscope can detect density of ultracold atoms nearly in their ground state in-situ in the presence of different kind of potentials generated by optical speckles.Lye et al. (2005) After minimizing the translator loss , the RNN can predict the density profile based on the potential profile 222see Supplementary Material for training method..
We build the RNN with the Taylor expansion order and the hidden state dimension up to . We observe that the loss will drop significantly as long as 333see Supplementary Material for detailed analysis.. Using the RNN model for the one-dimensional potential-to-density mapping is physically grounded because it respects the translational symmetry of the physical law that governs this mapping. As a result, an immediate advantage of the RNN is to gain spatial scalability, that is, what has been learned over a small system can be readily generalized and applied to larger systems. For instance, as shown in Fig. 2(c-e), the RNN is trained over a small window from to (the initial 5 outputs are excluded to reduce the sensitivity to initial conditions). After training, the RNN can perform the potential-to-density mapping for a much larger system, from to . Fig. 2(c-e) shows that the RNN output matches nicely with the target density profile (with about 10% relative error) on the test dataset for different classes of potential profiles, either shallow or deep, and either smooth or rough. This result demonstrates the prediction power of the RNN model.
By learning to perform the potential-to-density mapping, the RNN translator must have developed some intuitions about the underlying physics. Historically, advances in physics are often marked by formulating physical phenomena in term of differential equations, such as Newton’s law of motion, Maxwell’s equation of electromagnetism, and the Schrödinger equation of quantum mechanics. The RNN provides a universal representation of recurrent equations as discretized versions of the differential equations, and therefore the update rules of its hidden state can be interpreted as machine’s understanding of the physical laws.Ma et al. (2018); Banchi et al. (2018) As the RNN performs the translation, it generates a sequence of hidden states containing the essential variables governing the physics of potential-to-density mapping, mixed with other redundant or irrelevant information. To extract the knowledge from these hidden state data, we design a higher-level machine, called the knowledge distiller, to learn from the neural activity (the hidden state sequence) of the lower-level translator. It works on the RNN hidden states to compress the information and to extract the underlying rule. The auto-encoder architecture is widely used for information compression.Bengio et al. (2013); Kingma and Welling (2013) Here we incorporate the auto-encoder in another recurrent neural network structure as a recurrent auto-encoder (RAE), because we not only need to find out the essential variables in the hidden states but also need to determine the update rules of these essential variables.
The architecture of the RAE knowledge distiller is illustrated in Fig. 3. The RAE distiller first encodes the hidden state of the RNN translator at a given step to the latent variable , and then tries to reconstruct the hidden states for subsequent steps () by evolving and decoding the latent variable. The update equations are given by
[TABLE]
where and represent the encoder and decoder maps respectively. Here the RAE hidden state is updated by an linear transformation that will still depend on the input potential sequence , as illustrated in Fig. 3(b). The encoder and the decoder are implemented by feedforward networks as shown in Fig. 3(c). The RAE is trained to minimize the reconstruction loss
[TABLE]
It is important that the RAE (knowledge distiller) hidden state has a smaller dimension compared to the dimension of the RNN (translator) hidden state , therefore it can enforce an information bottleneck that only allows the vital information to be passed down in . Furthermore, instead of using a single auto-encoder to compress the hidden state at each step independently, the RAE connects a series of decoders together by a recurrent neural network. This design is to ensure that the latent representation remains coherent among a series of steps and contains the key variables that should be passed down along the sequence. A similar RAE architecture was proposed in Ref. Mirowski et al., 2010 and recently redesigned in Ref. Iten et al., 2018 to enable AI scientific discovery on sequential data. In this way, the RAE compresses the original RNN to a more compact RNN capturing the most essential information and its induced update rules.
As shown in Fig. 4(a), we find that the reconstruction loss of the RAE increases dramatically only when its hidden state dimension is squeezed below two (i.e. ), implying that the key feature can be stored in a two-component real vector (i.e. ) in the most parsimonious manner, as . Here we show that in fact represent the quantum wave function and its first order derivation. The evidences are two fold:
First, we try to use the trained RNN to predict the density with a constant potential , the result of which should be with being the momentum. If and are the wave function and its derivative, it should be and , respectively, whose periods are twice of the period of with phases shifted by relative to each other. As shown in Fig. 4(c), indeed displays the periodicity doubling and the relative phase shift.
Second, we open up the recurrent block of the RAE to extract the update rules for , which is machine’s formulation of the physical rules. The update rules are encoded in the transformation matrix , which are parameterized by the Taylor expansion coefficient matrices . To connect this formulation to the Schrödinger equation we familiar with, we notice that this mapping is invariant under a linear transformation applied to all . We find that it is always possible to find a proper linear transformation that can simultaneously bring all to the following form
[TABLE]
Here the numerical matrix elements are what we obtained from a particular instance of the trained RAE. They can be associated to the lattice constant to the leading order given that , and we have also verified that they scale correctly with as proposed. The result in Eq. (6) points to the following difference equation
[TABLE]
If we interpret as the quantum wave function and as its first order derivative , Eq. (7) corresponds to a discrete version of the Schrödinger equation as the particle energy was taken to be zero. So the RAE identifies two real numbers as the essential variables in the hidden states. They can be interpreted as the quantum wave function and its first order derivative. Their update rule is consistent with the Schrödinger equation.
In this way, without any prior knowledge of quantum mechanics, the introspective learning architecture can develop the concept of the quantum wave function and discover the Schrödinger equation when it is only provided with experimental data of potential and density pairs. As a consistency check, we train the same introspective recurrent neural network on the potential and density data of the high-temperature thermal gas following at a fixed inverse temperature . In this case, we can even reduce the RAE to an auto-encoder (AE) without sacrificing the reconstruction loss . As shown in Fig. 4(b), the remains vanishing for any , implying that there is no need to pass any variable along the sequence and hence the Schödinger equation will not emerge for thermal gas.
In conclusion, we design the architecture that combines a task machine directly learning the experimental data and an introspective machine working on the neural activations of the task machine. The separation of the task machine from the introspective machine effectively isolates the knowledge distillation from affecting the task performance, such that the whole system can simultaneously improve the task performance and approach the parsimonious limit of knowledge representation, without trading off between one another. Here we show that this architecture can discover the Schrödinger equation from the potential-to-density data. Therefore we name it as the “Schrödinger machine”. We envision that the same architecture can be generally applied to other machine learning applications to physics problems and enable machine learning to discover new physics in the future.
Besides, there are another few points worth highlighting in this work. First, although the use of Taylor expansion for the non-linear functions in our RNN is not essential and can be replaced by neural network models, it has the advantage of being analytical tractability which makes it easier to understand how the RNN works. Second, the potential-to-density mapping is also an essential component in the density functional theory, known as the Kohn-Sham mapping.Kohn and Sham (1965) The existing machine learning solutions for this task include the kernel method and the convolutional neural network approach.Snyder et al. (2012); Li et al. (2014, 2016); Brockherde et al. (2017); Khoo et al. (2017) The RNN approach introduced here has the advantage of being spatially scalable without retraining, which could find potential applications in boosting the density functional calculation and material search. Thirdly, we invent a model that incorporates the auto-encoder with the recurrent neural network, which can find a compact representation of the entire RNN model. This algorithm can find its application in other occasions of model compression and knowledge transfer.
Acknowledgement. This work is funded mostly by Grant No. 2016YFA0301600 and NSFC Grant No. 11734010. CW acknowledges the support of the China Scholarship Council. YZY acknowledges the stimulating discussion with Da Xiao, Lei Ma and Mingli Yuan in the 2017 and 2018 Swarma Club Kaifeng Research Camp.
I Supplemental Material
Appendix A Data Acquisition
The data for training RNN are generated by solving the “simplified” Schrödinger equation in 1d
[TABLE]
labels the position in 1D. The potential begins at and for where is a short range cut-off. We define , then the wave function should take the form of for . Matching the wave function and its derivative will give the relations,
[TABLE]
[TABLE]
With these relations, we can solve all the starting from a fixed initial condition , hence we can construct the wave function . Finally the density at is given by
[TABLE]
In summary, each data is generated in following steps:
Set and the rest . Where is a random number uniformly distributed in for each , and is a random number uniformly distributed in which is the same for each sequence. We use to randomly shift the energy scale for each data. 2. 2.
Make the potential more smooth by performing a flatten operation, , for times, where is a random integer between and . 3. 3.
Get the density sequence for this potential by solving Eq. (9),Eq. (10) and using Eq. (11).
In practice, we collect 15000 data, 10000 of them used for training and 5000 of them are used for validation.
While the potential data for RAE are generated in the same way as for RNN, and the hidden state are collected by evolving the trained RNN. We collect 15000 data, 10000 of them are used for training and 5000 of them are used for the validation.
Appendix B Network Parameters
We elaborate on the details of our training process. For the RNN based on Taylor expansion, we cut off the expansion at power , and consider the hidden space dimension from 1 to 6. Taking as an example, the initial and the vector without loss of generality, the parameter is set to be initially. We initialize the coefficient matrices to
[TABLE]
where stands for the dimensional identity matrix and stands for the dimensional random matrix whose elements follow independent Gaussian distributions (with unit variance and zero mean). We use the ADAM optimizer with learning rate . The mini-batch size is . The training window is from to .
For the RAE network, the encoder is a feedforward network of structure and the decoder is also a feedforward network of structure. We use the ADAM optimizer with learning rate . The mini-batch size is . The training window is from to .
Appendix C Analysis of RNN Translator Loss
The RNN translator may not be able to formulate physical laws in the most parsimonious language. The hidden state of the RNN may contain redundant information. In fact, there is an analytically tractable limit where we can explicitly demonstrate this possibility. For example, the RNN may tried to capture the differential equation for the density profile directly, instead of that for the quantum wave function. To simplify the analysis, let us take as our energy unit and define the potential energy with respect to the single-particle energy level, then the Schrödinger equation for the BEC wave function takes a rather simple form of . However, in terms of the density profile , the Schrödinger equation implies
[TABLE]
where and are two other real profiles that combine with to form a system of linear differential equations. The recurrent rule for such a system lies within the description power of our RNN architecture. If the RNN choose to identify its hidden state as , the following parameters will allow it to model Eq. (13) with good accuracy to the first order in :
[TABLE]
This theoretical construction at least provides us a base RNN that demonstrates why the proposed architecture could work in principle. The performance can be further improved by relaxing the parameters from this idea limit or by enlarging the hidden state dimension .
However, what is the minimum hidden state dimension (in terms of real variables) for the RNN to function well in the potential-to-density mapping? Can the RNN discover that the quantum wave function could provide a more parsimonious description, which only requires two real variables and to parameterize? To answer these questions, we train the RNN translator under different hidden state dimensions . As shown in Fig. 5, we observe that the loss only drop significantly if , implying that the RNN was unable to realize the more efficient () wave function description. For the case, as we read out the hidden states at each step, we found that they indeed correspond to the vector up to specific linear transformation (depending on the random initialization of the model parameters), confirming that the RNN indeed works like the base model Eq. (14). From this example, we see that the RNN could develop legitimate and predictive rules of physics, such as Eq. (13), from the observation data. It tends to work directly with the variables present in the observation data to get the job done. Sometimes the rules it found can work well enough that the RNN may not have the motivation to develop higher-level concepts like quantum wave functions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Carifio et al. (2017) J. Carifio, J. Halverson, D. Krioukov, and B. D. Nelson, Journal of High Energy Physics 9 , 157 (2017), eprint 1707.00655.
- 2Koch-Janusz and Ringel (2018) M. Koch-Janusz and Z. Ringel, Nature Physics 14 , 578 (2018), eprint 1704.06279.
- 3You et al. (2018) Y.-Z. You, Z. Yang, and X.-L. Qi, Phys. Rev. B 97 , 045153 (2018), eprint 1709.01223.
- 4Hashimoto et al. (2018) K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya, Phys. Rev. D 98 , 046019 (2018).
- 5Torlai and Melko (2016) G. Torlai and R. G. Melko, Phys. Rev. B 94 , 165134 (2016), eprint 1606.02718.
- 6Wang (2016) L. Wang, Phys. Rev. B 94 , 195105 (2016).
- 7Carrasquilla and Melko (2017) J. Carrasquilla and R. G. Melko, Nature Physics 13 , 431 (2017), eprint 1605.01735.
- 8van Nieuwenburg et al. (2017) E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nature Physics 13 , 435 (2017), eprint 1610.02048.
