Emergent Quantum Mechanics in an Introspective Machine Learning   Architecture

Ce Wang; Hui Zhai; Yi-Zhuang You

arXiv:1901.11103·cond-mat.dis-nn·December 2, 2019

Emergent Quantum Mechanics in an Introspective Machine Learning Architecture

Ce Wang, Hui Zhai, Yi-Zhuang You

PDF

TL;DR

This paper presents an introspective neural network architecture that autonomously learns quantum concepts like the wave function and Schrödinger equation from simulated data, demonstrating potential for discovering new physics.

Contribution

It introduces a novel machine learning framework that automatically derives quantum laws from data, bridging neural networks and fundamental physics discovery.

Findings

01

Successfully learned the quantum wave function from data

02

Discovered the Schrödinger equation through the architecture

03

Demonstrated potential for autonomous physics discovery

Abstract

Can physical concepts and laws emerge in a neural network as it learns to predict the observation data of physical systems? As a benchmark and a proof-of-principle study of this possibility, here we show an introspective learning architecture that can automatically develop the concept of the quantum wave function and discover the Schr\"odinger equation from simulated experimental data of the potential-to-density mappings of a quantum particle. This introspective learning architecture contains a machine translator to perform the potential to density mapping, and a knowledge distiller auto-encoder to extract the essential information and its update law from the hidden states of the translator, which turns out to be the quantum wave function and the Schr\"odinger equation. We envision that our introspective learning architecture can enable machine learning to discover new physics in the…

Equations34

h_{i} = W (V_{i}) \cdot h_{i - 1}, ρ_{i}^{'} = P (h_{i}) .

h_{i} = W (V_{i}) \cdot h_{i - 1}, ρ_{i}^{'} = P (h_{i}) .

L_{RNN} = i \in window \sum (ρ_{i}^{'} - ρ_{i})^{2} .

L_{RNN} = i \in window \sum (ρ_{i}^{'} - ρ_{i})^{2} .

W (V_{i}) = n = 0 \sum n_{W} W^{(n)} V_{i}^{n}, P (h_{i}) = p^{⊺} \cdot h_{i},

W (V_{i}) = n = 0 \sum n_{W} W^{(n)} V_{i}^{n}, P (h_{i}) = p^{⊺} \cdot h_{i},

g_{i_{0}} g_{i} h_{i}^{'} = E (h_{i_{0}}), = \tilde{W} (V_{i}) \cdot g_{i - 1}, (i = i_{0} + 1, i_{0} + 2, \dots) = D (g_{i}), (i = i_{0}, i_{0} + 1, i_{0} + 2, \dots)

g_{i_{0}} g_{i} h_{i}^{'} = E (h_{i_{0}}), = \tilde{W} (V_{i}) \cdot g_{i - 1}, (i = i_{0} + 1, i_{0} + 2, \dots) = D (g_{i}), (i = i_{0}, i_{0} + 1, i_{0} + 2, \dots)

L_{RAE} = i \in window \sum (h_{i}^{'} - h_{i})^{2} .

L_{RAE} = i \in window \sum (h_{i}^{'} - h_{i})^{2} .

M^{- 1} \tilde{W}^{(0)} M M^{- 1} \tilde{W}^{(1)} M = [0.9993 0.0013 0.1007 0.9987] \approx [10 a 1], = [0.0067 0.1001 0.0004 0.0024] \approx [0 a 00] .

M^{- 1} \tilde{W}^{(0)} M M^{- 1} \tilde{W}^{(1)} M = [0.9993 0.0013 0.1007 0.9987] \approx [10 a 1], = [0.0067 0.1001 0.0004 0.0024] \approx [0 a 00] .

[g_{i + 1, 1} g_{i + 1, 2}] = [1 a V_{i} a 1] [g_{i, 1} g_{i, 2}] .

[g_{i + 1, 1} g_{i + 1, 2}] = [1 a V_{i} a 1] [g_{i, 1} g_{i, 2}] .

V (x) ψ (x) = \partial_{x}^{2} ψ (x) .

V (x) ψ (x) = \partial_{x}^{2} ψ (x) .

k_{i + 1} A_{i + 1} =

k_{i + 1} A_{i + 1} =

+ k_{i} cos (k_{i} x_{i}) cos (k_{i + 1} x_{i}))

+ B_{i} (k_{i + 1} cos (k_{i} x_{i}) sin (k_{i + 1} x_{i})

- k_{i} sin (k_{i} x_{i}) cos (k_{i + 1} x_{i}))

k_{i + 1} B_{i + 1} =

k_{i + 1} B_{i + 1} =

+ k_{i + 1} cos (k_{i} x_{i}) cos (k_{i + 1} x_{i}))

+ A_{i} (k_{i + 1} sin (k_{i} x_{i}) cos (k_{i + 1} x_{i})

- k_{i} cos (k_{i} x_{i}) sin (k_{i + 1} x_{i}))

ρ_{i} = ψ (x_{i})^{2} .

ρ_{i} = ψ (x_{i})^{2} .

W^{(0)} W^{(n)} = 1_{d \times d} + \frac{0.01}{d} randn_{d \times d}, (for n = 0) = \frac{0.01}{d} randn_{d \times d}, (for n > 0)

W^{(0)} W^{(n)} = 1_{d \times d} + \frac{0.01}{d} randn_{d \times d}, (for n = 0) = \frac{0.01}{d} randn_{d \times d}, (for n > 0)

\partial_{x} ρ (x) η (x) ξ (x) = 0 V (x) 0 20 2 V (x) 010 ρ (x) η (x) ξ (x),

\partial_{x} ρ (x) η (x) ξ (x) = 0 V (x) 0 20 2 V (x) 010 ρ (x) η (x) ξ (x),

W^{(0)} = 100 2 a 10 0 a 1, W^{(1)} = 0 a 0 00 2 a 000, p = 100 .

W^{(0)} = 100 2 a 10 0 a 1, W^{(1)} = 0 a 0 00 2 a 000, p = 100 .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Emergent Quantum Mechanics in an Introspective Machine Learning Architecture

Ce Wang

Hui Zhai

[email protected]

Institute for Advanced Study, Tsinghua University, Beijing 100084, China

Yi-Zhuang You

[email protected]

Department of Physics, University of California, San Diego, CA 92093, USA

Abstract

Can physical concepts and laws emerge in a neural network as it learns to predict the observation data of physical systems? As a benchmark and a proof-of-principle study of this possibility, here we show an introspective learning architecture that can automatically develop the concept of the quantum wave function and discover the Schrödinger equation from simulated experimental data of the potential-to-density mappings of a quantum particle. This introspective learning architecture contains a machine translator to perform the potential to density mapping, and a knowledge distiller auto-encoder to extract the essential information and its update law from the hidden states of the translator, which turns out to be the quantum wave function and the Schrödinger equation. We envision that our introspective learning architecture can enable machine learning to discover new physics in the future.

The ongoing third wave of artificial intelligence has made great achievements in employing neural-network-based machine learning for industry and social applications. Inspired by this great success, machine learning algorithms have also been rapidly applied to various directions of physics research, ranging from high-energy and string theory to condensed matter, atomic, molecular and optical physics.Carifio et al. (2017); Koch-Janusz and Ringel (2018); You et al. (2018); Hashimoto et al. (2018); Torlai and Melko (2016); Wang (2016); Carrasquilla and Melko (2017); van Nieuwenburg et al. (2017); Zhang and Kim (2017); Wang and Zhai (2017, 2018); Zhang et al. (2018) While there has been many successful examples of machine assisted physics research, it remains an ambitious goal to explore the potential of machine learning in unsupervised discovery of concepts and laws of physics from observation data.Iten et al. (2018); Wu and Tegmark (2018) A major challenge is to understand how the machine “thinks”, or what approaches have been developed inside its mind. This typically requires us to open up the black box of the neural network and to identify the most relevant emergent features in the neural activity. Can the analysis of the neural activity also be automated by the machine itself? Can knowledge emerges as the machine examines its own information flow introspectively? To demonstrate these possibilities, here we report an introspective learning architecture, as illustrated in Fig. 1, that allows the machine to distill the knowledge about quantum mechanics from the observation of the density distributions of a quantum particle in different shapes of potentials.

As a proof-of-concept study, we consider a single quantum particle moving in a one-dimensional space with certain potential. Suppose we can measure the particle density for each given potential, we supply the machine with the potential profile as the input and the density profile as the target, and challenge the machine to discover the underlying rule governing the potential-to-density mapping. We discretize the potential $V(x)$ and density profiles $\rho(x)$ along the one-dimensional space and treat them as sequences of real numbers: $V_{i}=V(x_{i})$ and $\rho_{i}=\rho(x_{i})$ , where $x_{i}=ai$ are the discrete coordinates for $i=0,1,2,\cdots$ , which are evenly distributed along the one-dimensional space with a fixed separation $a=0.1$ . We assume that the potential is always measured with respect to the energy of the particle, such that the particle energy is effectly fixed at zero. We will only consider the case of $V_{i}<0$ , such that the particle remains in extended states.

By treating both the potential and density profiles as sequential data, the potential-to-density problem belongs to a broader class of sequence-to-sequence mapping,Kalchbrenner and Blunsom (2013); Sutskever et al. (2014); Cho et al. (2014); Bahdanau et al. (2014) which can be handled by the recurrent neural network (RNN).Goodfellow et al. (2016) The RNN has been widely used in natural language processing to translate sequences of words from the source language to the target language.Neubig (2017) We apply the RNN architecture to perform the potential-to-density mapping as a translation task. In each step, the RNN takes an input $V_{i}$ from the source sequence, modifies its internal hidden state $h_{i}$ accordingly, and generates the output $\rho^{\prime}_{i}$ based on the hidden state, as illustrated in Fig. 2(a). We adopt the following update equations

[TABLE]

where both the input $V_{i}\in\mathbb{R}$ and the output $\rho^{\prime}_{i}\in\mathbb{R}$ are scalars and the hidden state $h_{i}\in\mathbb{R}^{d}$ is a $d$ -dimensional vector. The hidden state $h_{i}$ is updated by an input-dependent linear transformation, represented by a $d\times d$ matrix $W(V_{i})\in\mathbb{R}^{d\times d}$ multiplied to the vector $h_{i}$ . The output $\rho^{\prime}_{i}$ is generated from the hidden state by a projection map $P(h_{i})$ . The data flow is graphically represented in Fig. 2(b). The output sequence $\rho^{\prime}_{i}$ is then compared with the target sequence $\rho_{i}$ over a window of steps to evaluate the loss function

[TABLE]

How the RNN updates its hidden state and generates output is determined by the functions $W$ and $P$ . In general, $W$ and $P$ could be non-linear functions modeled by feedforward neural networks for instance. However, for our problem, we find it sufficient to model $W$ by a Taylor expansion (to the $n_{W}$ th order in $V_{i}$ ) and $P$ by a linear projection,

[TABLE]

where $W^{(n)}$ is the $n$ th order Taylor expansion coefficient matrix (each of the dimension $d\times d$ ) and $p$ is a $d$ -dimensional vector. The elements in $W^{(n)}$ and $p$ are model parameters to be trained to minimize the loss function $\mathcal{L}_{\text{RNN}}$ . The training dataset contains pairs of potential and density sequences that serve as parallel corpora to train the RNN translator. They are currently obtained from numerical simulation 111see Supplementary Material for details about data acquisition., but can be collected from experiments in future applications, from instance, the quantum gas microscope can detect density of ultracold atoms nearly in their ground state in-situ in the presence of different kind of potentials generated by optical speckles.Lye et al. (2005) After minimizing the translator loss $\mathcal{L}_{\text{RNN}}$ , the RNN can predict the density profile based on the potential profile 222see Supplementary Material for training method..

We build the RNN with the Taylor expansion order $n_{W}=2$ and the hidden state dimension up to $d=6$ . We observe that the loss $\mathcal{L}_{\text{RNN}}$ will drop significantly as long as $d\geq 3$ 333see Supplementary Material for detailed analysis.. Using the RNN model for the one-dimensional potential-to-density mapping is physically grounded because it respects the translational symmetry of the physical law that governs this mapping. As a result, an immediate advantage of the RNN is to gain spatial scalability, that is, what has been learned over a small system can be readily generalized and applied to larger systems. For instance, as shown in Fig. 2(c-e), the RNN is trained over a small window from $i=5$ to $i=55$ (the initial 5 outputs are excluded to reduce the sensitivity to initial conditions). After training, the RNN can perform the potential-to-density mapping for a much larger system, from $i=0$ to $i=400$ . Fig. 2(c-e) shows that the RNN output matches nicely with the target density profile (with about 10% relative error) on the test dataset for different classes of potential profiles, either shallow or deep, and either smooth or rough. This result demonstrates the prediction power of the RNN model.

By learning to perform the potential-to-density mapping, the RNN translator must have developed some intuitions about the underlying physics. Historically, advances in physics are often marked by formulating physical phenomena in term of differential equations, such as Newton’s law of motion, Maxwell’s equation of electromagnetism, and the Schrödinger equation of quantum mechanics. The RNN provides a universal representation of recurrent equations as discretized versions of the differential equations, and therefore the update rules of its hidden state can be interpreted as machine’s understanding of the physical laws.Ma et al. (2018); Banchi et al. (2018) As the RNN performs the translation, it generates a sequence of hidden states containing the essential variables governing the physics of potential-to-density mapping, mixed with other redundant or irrelevant information. To extract the knowledge from these hidden state data, we design a higher-level machine, called the knowledge distiller, to learn from the neural activity (the hidden state sequence) of the lower-level translator. It works on the RNN hidden states to compress the information and to extract the underlying rule. The auto-encoder architecture is widely used for information compression.Bengio et al. (2013); Kingma and Welling (2013) Here we incorporate the auto-encoder in another recurrent neural network structure as a recurrent auto-encoder (RAE), because we not only need to find out the essential variables in the hidden states but also need to determine the update rules of these essential variables.

The architecture of the RAE knowledge distiller is illustrated in Fig. 3. The RAE distiller first encodes the hidden state $h_{i_{0}}$ of the RNN translator at a given step $i_{0}$ to the latent variable $g_{i}$ , and then tries to reconstruct the hidden states $h_{i}$ for subsequent steps ( $i\geq i_{0}$ ) by evolving and decoding the latent variable. The update equations are given by

[TABLE]

where $E$ and $D$ represent the encoder and decoder maps respectively. Here the RAE hidden state $g_{i}\in\mathbb{R}^{\tilde{d}}$ is updated by an linear transformation $\tilde{W}(V_{i})$ that will still depend on the input potential sequence $V_{i}$ , as illustrated in Fig. 3(b). The encoder and the decoder are implemented by feedforward networks as shown in Fig. 3(c). The RAE is trained to minimize the reconstruction loss

[TABLE]

It is important that the RAE (knowledge distiller) hidden state $g_{i}$ has a smaller dimension $\tilde{d}$ compared to the dimension $d$ of the RNN (translator) hidden state $h_{i}$ , therefore it can enforce an information bottleneck that only allows the vital information to be passed down in $g_{i}$ . Furthermore, instead of using a single auto-encoder to compress the hidden state at each step independently, the RAE connects a series of decoders together by a recurrent neural network. This design is to ensure that the latent representation $g_{i}$ remains coherent among a series of steps and contains the key variables that should be passed down along the sequence. A similar RAE architecture was proposed in Ref. Mirowski et al., 2010 and recently redesigned in Ref. Iten et al., 2018 to enable AI scientific discovery on sequential data. In this way, the RAE compresses the original RNN to a more compact RNN capturing the most essential information and its induced update rules.

As shown in Fig. 4(a), we find that the reconstruction loss $\mathcal{L}_{\text{RAE}}$ of the RAE increases dramatically only when its hidden state dimension $\tilde{d}$ is squeezed below two (i.e. $\tilde{d}<2$ ), implying that the key feature can be stored in a two-component real vector (i.e. $\tilde{d}=2$ ) in the most parsimonious manner, as $g_{i}=(g_{i,1},g_{i,2})$ . Here we show that $g_{i}$ in fact represent the quantum wave function and its first order derivation. The evidences are two fold:

First, we try to use the trained RNN to predict the density with a constant potential $V$ , the result of which should be $\cos^{2}(kx_{i})$ with $k=\sqrt{-V}$ being the momentum. If $g_{i,1}$ and $g_{i,2}$ are the wave function and its derivative, it should be $\cos(kx_{i})$ and $\sin(kx_{i})$ , respectively, whose periods are twice of the period of $\rho_{i}$ with phases shifted by $\pi/2$ relative to each other. As shown in Fig. 4(c), $g_{i}$ indeed displays the periodicity doubling and the relative phase shift.

Second, we open up the recurrent block of the RAE to extract the update rules for $g_{i}$ , which is machine’s formulation of the physical rules. The update rules are encoded in the transformation matrix $\tilde{W}(V_{i})=\sum_{n=0}^{n_{W}}\tilde{W}^{(n)}V_{i}^{n}$ , which are parameterized by the Taylor expansion coefficient matrices $\tilde{W}^{(n)}$ . To connect this formulation to the Schrödinger equation we familiar with, we notice that this mapping is invariant under a linear transformation $M\in\mathrm{GL}(2,\mathbb{R})$ applied to all $\tilde{W}^{(n)}$ . We find that it is always possible to find a proper linear transformation that can simultaneously bring all $\tilde{W}^{(n)}$ to the following form

[TABLE]

Here the numerical matrix elements are what we obtained from a particular instance of the trained RAE. They can be associated to the lattice constant $a$ to the leading order given that $a=0.1$ , and we have also verified that they scale correctly with $a$ as proposed. The result in Eq. (6) points to the following difference equation

[TABLE]

If we interpret $g_{i,1}$ as the quantum wave function $\psi(x_{i})$ and $g_{i,2}$ as its first order derivative $\partial_{x}\psi(x_{i})$ , Eq. (7) corresponds to a discrete version of the Schrödinger equation $\partial_{x}^{2}\psi(x)=V(x)\psi(x)$ as the particle energy was taken to be zero. So the RAE identifies two real numbers as the essential variables in the hidden states. They can be interpreted as the quantum wave function and its first order derivative. Their update rule is consistent with the Schrödinger equation.

In this way, without any prior knowledge of quantum mechanics, the introspective learning architecture can develop the concept of the quantum wave function and discover the Schrödinger equation when it is only provided with experimental data of potential and density pairs. As a consistency check, we train the same introspective recurrent neural network on the potential and density data of the high-temperature thermal gas following $\rho_{i}\propto e^{-\beta V_{i}}$ at a fixed inverse temperature $\beta$ . In this case, we can even reduce the RAE to an auto-encoder (AE) without sacrificing the reconstruction loss $\mathcal{L}_{\text{AE}}$ . As shown in Fig. 4(b), the $\mathcal{L}_{\text{AE}}$ remains vanishing for any $\tilde{d}$ , implying that there is no need to pass any variable along the sequence and hence the Schödinger equation will not emerge for thermal gas.

In conclusion, we design the architecture that combines a task machine directly learning the experimental data and an introspective machine working on the neural activations of the task machine. The separation of the task machine from the introspective machine effectively isolates the knowledge distillation from affecting the task performance, such that the whole system can simultaneously improve the task performance and approach the parsimonious limit of knowledge representation, without trading off between one another. Here we show that this architecture can discover the Schrödinger equation from the potential-to-density data. Therefore we name it as the “Schrödinger machine”. We envision that the same architecture can be generally applied to other machine learning applications to physics problems and enable machine learning to discover new physics in the future.

Besides, there are another few points worth highlighting in this work. First, although the use of Taylor expansion for the non-linear functions in our RNN is not essential and can be replaced by neural network models, it has the advantage of being analytical tractability which makes it easier to understand how the RNN works. Second, the potential-to-density mapping is also an essential component in the density functional theory, known as the Kohn-Sham mapping.Kohn and Sham (1965) The existing machine learning solutions for this task include the kernel method and the convolutional neural network approach.Snyder et al. (2012); Li et al. (2014, 2016); Brockherde et al. (2017); Khoo et al. (2017) The RNN approach introduced here has the advantage of being spatially scalable without retraining, which could find potential applications in boosting the density functional calculation and material search. Thirdly, we invent a model that incorporates the auto-encoder with the recurrent neural network, which can find a compact representation of the entire RNN model. This algorithm can find its application in other occasions of model compression and knowledge transfer.

Acknowledgement. This work is funded mostly by Grant No. 2016YFA0301600 and NSFC Grant No. 11734010. CW acknowledges the support of the China Scholarship Council. YZY acknowledges the stimulating discussion with Da Xiao, Lei Ma and Mingli Yuan in the 2017 and 2018 Swarma Club Kaifeng Research Camp.

I Supplemental Material

Appendix A Data Acquisition

The data for training RNN are generated by solving the “simplified” Schrödinger equation in 1d

[TABLE]

$x$ labels the position in 1D. The potential begins at $x=0$ and $V(x_{i})=V_{i}$ for $x_{i}\equiv ia$ where $a=0.1$ is a short range cut-off. We define $k_{i}=\sqrt{-V_{i}}$ , then the wave function should take the form of $\psi(x)=A_{i}\sin(k_{i}x)+B_{i}\cos(k_{i}x)$ for ${x_{i}\leq x<x_{i+1}}$ . Matching the wave function and its derivative will give the relations,

[TABLE]

With these relations, we can solve all the $A_{i},B_{i}$ starting from a fixed initial condition $A_{0}=1,B_{0}=1$ , hence we can construct the wave function $\psi(x)$ . Finally the density at $x_{i}$ is given by

[TABLE]

In summary, each data is generated in following steps:

Set $V_{1}=-1$ and the rest $V_{i}=-2*\mathsf{rand}-R$ . Where $\mathsf{rand}$ is a random number uniformly distributed in $[0,1]$ for each $V_{i}$ , and $R$ is a random number uniformly distributed in $[0,1]$ which is the same for each sequence. We use $R$ to randomly shift the energy scale for each data. 2. 2.

Make the potential $V_{i}$ more smooth by performing a flatten operation, $V_{i+1}=0.5*(V_{i}+V_{i+1})$ , for $q$ times, where $q$ is a random integer between $1$ and $20$ . 3. 3.

Get the density sequence $\rho_{i}$ for this potential by solving Eq. (9),Eq. (10) and using Eq. (11).

In practice, we collect 15000 data, 10000 of them used for training and 5000 of them are used for validation.

While the potential data for RAE are generated in the same way as for RNN, and the hidden state ${h_{i}}$ are collected by evolving the trained RNN. We collect 15000 data, 10000 of them are used for training and 5000 of them are used for the validation.

Appendix B Network Parameters

We elaborate on the details of our training process. For the RNN based on Taylor expansion, we cut off the expansion at power $n_{W}=2$ , and consider the hidden space dimension $d$ from 1 to 6. Taking $d=6$ as an example, the initial $h_{0}=(1,1,1,1,1,1)$ and the vector $p=(p_{1},0,0,0,0,0)$ without loss of generality, the parameter $p_{1}$ is set to be $1$ initially. We initialize the coefficient matrices $W^{(n)}$ to

[TABLE]

where $\mathbf{1}_{d\times d}$ stands for the $d\times d$ dimensional identity matrix and $\mathsf{randn}_{d\times d}$ stands for the $d\times d$ dimensional random matrix whose elements follow independent Gaussian distributions (with unit variance and zero mean). We use the ADAM optimizer with learning rate $0.0002$ . The mini-batch size is $5$ . The training window is from $i=5$ to $i=55$ .

For the RAE network, the encoder is a feedforward network of $d=6\to 100\to\mathsf{ramp}\to\tilde{d}$ structure and the decoder is also a feedforward network of $\tilde{d}\to 100\to\mathsf{ramp}\to d=6$ structure. We use the ADAM optimizer with learning rate $0.001$ . The mini-batch size is $5$ . The training window is from $i=5$ to $i=60$ .

Appendix C Analysis of RNN Translator Loss

The RNN translator may not be able to formulate physical laws in the most parsimonious language. The hidden state of the RNN may contain redundant information. In fact, there is an analytically tractable limit where we can explicitly demonstrate this possibility. For example, the RNN may tried to capture the differential equation for the density profile directly, instead of that for the quantum wave function. To simplify the analysis, let us take $\hbar^{2}/(2ma^{2})$ as our energy unit and define the potential energy with respect to the single-particle energy level, then the Schrödinger equation for the BEC wave function $\psi(x)$ takes a rather simple form of $\partial_{x}^{2}\psi(x)=V(x)\psi(x)$ . However, in terms of the density profile $\rho(x)=|\psi(x)|^{2}$ , the Schrödinger equation implies

[TABLE]

where $\eta(x)=\operatorname{Re}\psi^{*}(x)\partial_{x}\psi(x)$ and $\xi(x)=|\partial_{x}\psi(x)|^{2}$ are two other real profiles that combine with $\rho(x)$ to form a system of linear differential equations. The recurrent rule for such a system lies within the description power of our RNN architecture. If the RNN choose to identify its hidden state as $h_{i}=[\rho(x_{i}),\eta(x_{i}),\xi(x_{i})]^{\intercal}$ , the following parameters will allow it to model Eq. (13) with good accuracy to the first order in $a$ :

[TABLE]

This theoretical construction at least provides us a base RNN that demonstrates why the proposed architecture could work in principle. The performance can be further improved by relaxing the parameters from this idea limit or by enlarging the hidden state dimension $d$ .

However, what is the minimum hidden state dimension $d$ (in terms of real variables) for the RNN to function well in the potential-to-density mapping? Can the RNN discover that the quantum wave function $\psi(x)$ could provide a more parsimonious description, which only requires two real variables $\operatorname{Re}\psi(x)$ and $\operatorname{Im}\psi(x)$ to parameterize? To answer these questions, we train the RNN translator under different hidden state dimensions $d$ . As shown in Fig. 5, we observe that the loss $\mathcal{L}_{\text{RNN}}$ only drop significantly if $d\geq 3$ , implying that the RNN was unable to realize the more efficient ( $d=2$ ) wave function description. For the $d=3$ case, as we read out the hidden states $h_{i}$ at each step, we found that they indeed correspond to the vector $[\rho(x_{i}),\eta(x_{i}),\xi(x_{i})]^{\intercal}$ up to specific linear transformation (depending on the random initialization of the model parameters), confirming that the RNN indeed works like the base model Eq. (14). From this example, we see that the RNN could develop legitimate and predictive rules of physics, such as Eq. (13), from the observation data. It tends to work directly with the variables present in the observation data to get the job done. Sometimes the rules it found can work well enough that the RNN may not have the motivation to develop higher-level concepts like quantum wave functions.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Carifio et al. (2017) J. Carifio, J. Halverson, D. Krioukov, and B. D. Nelson, Journal of High Energy Physics 9 , 157 (2017), eprint 1707.00655.
2Koch-Janusz and Ringel (2018) M. Koch-Janusz and Z. Ringel, Nature Physics 14 , 578 (2018), eprint 1704.06279.
3You et al. (2018) Y.-Z. You, Z. Yang, and X.-L. Qi, Phys. Rev. B 97 , 045153 (2018), eprint 1709.01223.
4Hashimoto et al. (2018) K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya, Phys. Rev. D 98 , 046019 (2018).
5Torlai and Melko (2016) G. Torlai and R. G. Melko, Phys. Rev. B 94 , 165134 (2016), eprint 1606.02718.
6Wang (2016) L. Wang, Phys. Rev. B 94 , 195105 (2016).
7Carrasquilla and Melko (2017) J. Carrasquilla and R. G. Melko, Nature Physics 13 , 431 (2017), eprint 1605.01735.
8van Nieuwenburg et al. (2017) E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Nature Physics 13 , 435 (2017), eprint 1610.02048.