A machine learning approach to dynamical properties of quantum many-body   systems

Douglas Hendry; Adrian E. Feiguin

arXiv:1907.01384·quant-ph·December 18, 2019

A machine learning approach to dynamical properties of quantum many-body systems

Douglas Hendry, Adrian E. Feiguin

PDF

TL;DR

This paper introduces a variational method using restricted Boltzmann machines to compute dynamical properties and spectral functions of quantum many-body systems, addressing challenges in excited state analysis.

Contribution

It presents a novel approach combining RBMs, natural gradient descent, and Monte Carlo techniques to calculate dynamical spectra, improving accuracy and generality.

Findings

01

Successfully computed the dynamical spin structure factor of the 1D J1-J2 Heisenberg model.

02

Demonstrated improved accuracy with regularization strategies.

03

Method is adaptable to various variational forms.

Abstract

Variational representations of quantum states abound and have successfully been used to guess ground-state properties of quantum many-body systems. Some are based on partial physical insight (Jastrow, Gutzwiller projected, and fractional quantum Hall states, for instance), and others operate as a black box that may contain information about the underlying structure of entanglement and correlations (tensor networks, neural networks) and offer the advantage of a large set of variational parameters that can be efficiently optimized. However, using variational approaches to study excited states and, in particular, calculating the excitation spectrum, remains a challenge. We present a variational method to calculate the dynamical properties and spectral functions of quantum many-body systems in the frequency domain, where the Green's function of the problem is encoded in the form of a…

Equations112

ψ (σ^{z}, a, b, W) = h_{1}, h_{2}, \dots, h_{M} \sum e^{- E (σ, h)},

ψ (σ^{z}, a, b, W) = h_{1}, h_{2}, \dots, h_{M} \sum e^{- E (σ, h)},

- E (σ, h) = i = 1 \sum N a_{i} σ_{i}^{z} + i = 1 \sum N b_{i} h_{i} + i = 1 \sum N j = 1 \sum M W_{ij} σ_{i}^{z} h_{j},

- E (σ, h) = i = 1 \sum N a_{i} σ_{i}^{z} + i = 1 \sum N b_{i} h_{i} + i = 1 \sum N j = 1 \sum M W_{ij} σ_{i}^{z} h_{j},

G_{ij} (z) = ⟨ ψ ∣ A_{i}^{†} \frac{1}{z - H ^} A_{j} ∣ ψ ⟩

G_{ij} (z) = ⟨ ψ ∣ A_{i}^{†} \frac{1}{z - H ^} A_{j} ∣ ψ ⟩

E_{v a r} (α^{*}, α) = \frac{⟨ ψ _{α^{*}} ∣ H ^ ∣ ψ _{α} ⟩}{⟨ ψ _{α^{*}} ∣ ψ _{α} ⟩}

E_{v a r} (α^{*}, α) = \frac{⟨ ψ _{α^{*}} ∣ H ^ ∣ ψ _{α} ⟩}{⟨ ψ _{α^{*}} ∣ ψ _{α} ⟩}

E_{v a r} (α^{*}, α) = σ \sum P_{α} (σ) E_{l oc} (σ),

E_{v a r} (α^{*}, α) = σ \sum P_{α} (σ) E_{l oc} (σ),

P_{α} (σ) = \frac{∣ ψ _{α} ( σ ) ∣ ^{2}}{\sum _{σ^{'}} ∣ ψ _{α} ( σ ^{'} ) ∣ ^{2}}; E_{l oc} (σ) = \frac{⟨ σ ∣ H ∣ ψ ⟩}{ψ _{α} ( σ )}

P_{α} (σ) = \frac{∣ ψ _{α} ( σ ) ∣ ^{2}}{\sum _{σ^{'}} ∣ ψ _{α} ( σ ^{'} ) ∣ ^{2}}; E_{l oc} (σ) = \frac{⟨ σ ∣ H ∣ ψ ⟩}{ψ _{α} ( σ )}

α_{i} \to α_{i} - τ \frac{\partial E}{\partial α _{i}^{*}},

α_{i} \to α_{i} - τ \frac{\partial E}{\partial α _{i}^{*}},

f_{i} = \frac{\partial E _{v a r}}{\partial α _{i}^{*}} = ⟨ \hat{O}_{i}^{†} \hat{H} ⟩ - ⟨ \hat{O}_{i}^{†} ⟩ ⟨ \hat{H} ⟩ .

f_{i} = \frac{\partial E _{v a r}}{\partial α _{i}^{*}} = ⟨ \hat{O}_{i}^{†} \hat{H} ⟩ - ⟨ \hat{O}_{i}^{†} ⟩ ⟨ \hat{H} ⟩ .

\hat{O}_{i} = \frac{1}{ψ _{α}} \frac{\partial ψ _{α}}{\partial α _{i}} .

\hat{O}_{i} = \frac{1}{ψ _{α}} \frac{\partial ψ _{α}}{\partial α _{i}} .

γ (ψ, ϕ) = arccos \frac{⟨ ψ ∣ ϕ ⟩ ⟨ ϕ ∣ ψ ⟩}{⟨ ψ ∣ ψ ⟩ ⟨ ϕ ∣ ϕ ⟩}

γ (ψ, ϕ) = arccos \frac{⟨ ψ ∣ ϕ ⟩ ⟨ ϕ ∣ ψ ⟩}{⟨ ψ ∣ ψ ⟩ ⟨ ϕ ∣ ϕ ⟩}

d s^{2} = γ (ψ, ψ + δ ψ)^{2} = \frac{⟨ δ ψ ∣ δ ψ ⟩}{⟨ ψ ∣ ψ ⟩} - \frac{⟨ δ ψ ∣ ψ ⟩}{⟨ ψ ∣ ψ ⟩} \frac{⟨ ψ ∣ δ ψ ⟩}{⟨ ψ ∣ ψ ⟩}

d s^{2} = γ (ψ, ψ + δ ψ)^{2} = \frac{⟨ δ ψ ∣ δ ψ ⟩}{⟨ ψ ∣ ψ ⟩} - \frac{⟨ δ ψ ∣ ψ ⟩}{⟨ ψ ∣ ψ ⟩} \frac{⟨ ψ ∣ δ ψ ⟩}{⟨ ψ ∣ ψ ⟩}

d s^{2} = ij \sum δ α_{i}^{*} g_{ij} δ α_{j} = γ (ψ (α), ψ (α + δ α))^{2}

d s^{2} = ij \sum δ α_{i}^{*} g_{ij} δ α_{j} = γ (ψ (α), ψ (α + δ α))^{2}

g_{ij} = \frac{⟨ \partial _{i} ψ ∣ \partial _{j} ψ ⟩}{⟨ ψ ∣ ψ ⟩} - \frac{⟨ \partial _{i} ψ ∣ ψ ⟩}{⟨ ψ ∣ ψ ⟩} \frac{⟨ ψ ∣ \partial _{j} ψ ⟩}{⟨ ψ ∣ ψ ⟩}

g_{ij} = \frac{⟨ \partial _{i} ψ ∣ \partial _{j} ψ ⟩}{⟨ ψ ∣ ψ ⟩} - \frac{⟨ \partial _{i} ψ ∣ ψ ⟩}{⟨ ψ ∣ ψ ⟩} \frac{⟨ ψ ∣ \partial _{j} ψ ⟩}{⟨ ψ ∣ ψ ⟩}

g_{ij} = ⟨ \hat{O}_{i}^{†} \hat{O}_{j} ⟩ - ⟨ \hat{O}_{i}^{†} ⟩ ⟨ \hat{O}_{j} ⟩ .

g_{ij} = ⟨ \hat{O}_{i}^{†} \hat{O}_{j} ⟩ - ⟨ \hat{O}_{i}^{†} ⟩ ⟨ \hat{O}_{j} ⟩ .

j \sum g_{ij} Δ α_{i} = - τ \frac{\partial E}{\partial α _{i}^{*}} .

j \sum g_{ij} Δ α_{i} = - τ \frac{\partial E}{\partial α _{i}^{*}} .

G_{ij} (z) = ⟨ ψ ∣ A_{i}^{†} \frac{1}{z - H ^} A_{j} ∣ ψ ⟩

G_{ij} (z) = ⟨ ψ ∣ A_{i}^{†} \frac{1}{z - H ^} A_{j} ∣ ψ ⟩

∣ A_{i} ⟩

∣ A_{i} ⟩

∣ χ_{j} (z)⟩

(z - \hat{H}) ∣ χ_{j} (z)⟩ = \hat{A}_{j} ∣ ψ ⟩ = ∣ A_{j} ⟩ .

(z - \hat{H}) ∣ χ_{j} (z)⟩ = \hat{A}_{j} ∣ ψ ⟩ = ∣ A_{j} ⟩ .

A_{ij} (ω) = - \frac{1}{π} Im ⟨ A_{i} ∣ χ_{j} (z)⟩ .

A_{ij} (ω) = - \frac{1}{π} Im ⟨ A_{i} ∣ χ_{j} (z)⟩ .

((\hat{H} - E_{0} - ω)^{2} + η^{2}) ∣ \tilde{χ}_{j} ⟩ = - η ∣ A_{j} ⟩,

((\hat{H} - E_{0} - ω)^{2} + η^{2}) ∣ \tilde{χ}_{j} ⟩ = - η ∣ A_{j} ⟩,

A_{ij} (ω) = - \frac{1}{π} ⟨ A_{i} ∣ \tilde{χ}_{j} ⟩

A_{ij} (ω) = - \frac{1}{π} ⟨ A_{i} ∣ \tilde{χ}_{j} ⟩

(z - \hat{H}) ∣ χ_{j} (z)⟩ = ∣ A_{j} ⟩,

(z - \hat{H}) ∣ χ_{j} (z)⟩ = ∣ A_{j} ⟩,

γ (\hat{Q} χ, A) = arccos x

γ (\hat{Q} χ, A) = arccos x

x = \frac{⟨ χ ∣ Q ^ ^{†} ∣ A ⟩ ⟨ A ∣ Q ^ ∣ χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩ ⟨ A ∣ A ⟩} .

x = \frac{⟨ χ ∣ Q ^ ^{†} ∣ A ⟩ ⟨ A ∣ Q ^ ∣ χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩ ⟨ A ∣ A ⟩} .

\frac{\partial γ ^{2}}{\partial α _{i}^{*}} = γ \frac{x}{1 - x} [\frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} - \frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} ∣ A ⟩}{⟨ χ ∣ Q ^ ^{†} ∣ A ⟩}],

\frac{\partial γ ^{2}}{\partial α _{i}^{*}} = γ \frac{x}{1 - x} [\frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} - \frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} ∣ A ⟩}{⟨ χ ∣ Q ^ ^{†} ∣ A ⟩}],

∣ \partial_{i} χ ⟩ = σ \sum \frac{\partial χ ( σ )}{\partial α _{i}} ∣ σ ⟩ .

∣ \partial_{i} χ ⟩ = σ \sum \frac{\partial χ ( σ )}{\partial α _{i}} ∣ σ ⟩ .

j \sum g_{ij} Δ α_{j} = - λ \frac{\partial γ ^{2}}{\partial α _{i}^{*}}

j \sum g_{ij} Δ α_{j} = - λ \frac{\partial γ ^{2}}{\partial α _{i}^{*}}

g_{ij} = \frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} Q ^ ∣ \partial _{j} χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} - \frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} \frac{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ \partial _{j} χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} .

g_{ij} = \frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} Q ^ ∣ \partial _{j} χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} - \frac{⟨ \partial _{i} χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} \frac{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ \partial _{j} χ ⟩}{⟨ χ ∣ Q ^ ^{†} Q ^ ∣ χ ⟩} .

R (σ) = \frac{⟨ σ ∣ Q ^ ∣ χ ⟩}{⟨ σ ∣ A ⟩}

R (σ) = \frac{⟨ σ ∣ Q ^ ∣ χ ⟩}{⟨ σ ∣ A ⟩}

O_{i} (σ) = \frac{⟨ σ ∣ Q ^ ∣ \partial _{i} χ ⟩}{⟨ σ ∣ Q ^ ∣ χ ⟩} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A machine learning approach to dynamical properties of quantum many-body systems

Douglas Hendry

Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA

Adrian E. Feiguin

Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA

Abstract

Variational representations of quantum states abound and have successfully been used to guess ground-state properties of quantum many-body systems. Some are based on partial physical insight (Jastrow, Gutzwiller projected, and fractional quantum Hall states, for instance), and others operate as a black box that may contain information about the underlying structure of entanglement and correlations (tensor networks, neural networks) and offer the advantage of a large set of variational parameters that can be efficiently optimized. However, using variational approaches to study excited states and, in particular, calculating the excitation spectrum, remains a challenge. We present a variational method to calculate the dynamical properties and spectral functions of quantum many-body systems in the frequency domain, where the Green’s function of the problem is encoded in the form of a restricted Boltzmann machine (RBM). We introduce a natural gradient descent approach to solve linear systems of equations and use Monte Carlo to obtain the dynamical correlation function. In addition, we propose a strategy to regularize the results that improves the accuracy dramatically. As an illustration, we study the dynamical spin structure factor of the one dimensional $J_{1}-J_{2}$ Heisenberg model. The method is general and can be extended to other variational forms.

I Introduction

In the past couple of years, machine learning has permeated many areas of physics and found numerous applications in condensed matter physics and chemistry. These ideas acquire a very special meaning in the context of the quantum many-body problem where one deals with datasets that are exponentially large. Sophisticated techniques have been developed to tackle this difficult challenge, such as compressing the data by using information theory and machine learning tools Freericks et al. (2014) very similar in spirit to algorithms to compress images and videos. In our case, datasets are comprised of all possible electronic configurations and cannot be stored in the memory of the largest supercomputer. This is an “extreme data science” problem from an information processing perspective, and can be approached by means of importance sampling using stochastic methods such as Monte Carlo (MC) techniques. This process can be greatly simplified if one recognizes complex patterns in the data, which has led to a line of research now called quantum machine learningBiamonte et al. (2017) that uses machine-learning algorithms to extract insightful information about quantum systems.

Even though novel approaches based on tensor networks Orús (2014) hold promise for developing efficient and accurate algorithms to solve two-dimensional(2D) many-body problems, the density matrix renormalization group (DMRG) White (1992, 1993); Schollwöck (2005, 2011); Feiguin (2013a) method has remained as the method of choice. Although understood in the context of quantum information theory, these methods share the same underlying structure and are strongly rooted on machine learning ideas such as the low rank approximation behind principal component analysis (PCA). However, despite the success of DMRG for one-dimensional(1D) and quasi-one-dimensional geometries, extensions to actual two-dimensional systems remain challenging and applications are constrained to long cylinders and strips. The main hurdle is the fact that the number of states required to accurately represent a quantum many-body state is determined by the behavior of the entanglement entropy, the so-called “area law”.

Neural networks have successfully been used as variational wave function approximators to model the ground state of many-body quantum systems. The most promising results so far were achieved with restricted Boltzmann machines (RBM) Carleo and Troyer (2017); Saito (2017); Cai and Liu (2018); Glasser et al. (2018). RBMs are a type of artificial neural network which are widely used in machine learning to model the probability distribution of a given data set of binary vectors drawn from an unknown probability distribution. The components of these vectors comprise the visible layer of the neural network. In addition to the visible layer, one introduces a hidden layer which corresponds to the components of another set of binary vectors. These hidden vectors are auxiliary variables that expand the space of parameters and are ultimately factored out. The probability distribution of the visible vectors is formulated by first introducing a joint probability distribution for pairs of visible and hidden vectors from an energy function and Boltzmann weighting. Finally, the probability distribution for visible vectors is taken to be the sum of the joint probability distribution over all possible configurations of the hidden vectors: Carleo and TroyerCarleo and Troyer (2017) introduced a variational wavefunction for a spin- $\frac{1}{2}$ system of $N$ sites, which is inspired by the functional form of RBM. The visible layer corresponds to the spin configurations ${\vec{\sigma}}^{z}=(\sigma^{z}_{1},\sigma^{z}_{2},\cdots,\sigma^{z}_{N})$ . Then the coefficients of the wave function $|\psi\rangle=\sum_{{\vec{\sigma}}^{z}}\psi({\vec{\sigma}}^{z})|\sigma^{z}_{1},\sigma^{z}_{2},\cdots,\sigma^{z}_{N}\rangle$ are represented as:

[TABLE]

with

[TABLE]

where $h_{i}\in\{-1,1\}$ are hidden spin variables, and $a_{i},b_{i},W_{ij}$ are the weights. The terms in the exponents then correspond to the negative energies for pairs of hidden and visible vectors and determine the coefficients of the wave function. The range of values the variational wave function can take on increases as the number of the hidden spin variables $M$ increases. The summation over hidden layer vectors can be factored out which reduce the wave functions coefficients $\psi({\vec{\sigma}}^{z},\vec{a},\vec{b},W)=e^{\sum_{i=1}^{N}a_{i}\sigma^{z}_{i}}\prod_{j=1}^{M}2\cosh{(\theta_{j})}$ where $\theta_{j}=b_{j}+\sum_{i=1}^{N}W_{ij}\sigma^{z}_{i}$ .

We propose to generalize this approach to the calculation of excited states. A simple naive idea would be to utilize $(H-\omega)^{2}$ as the new Hamiltonian, where $\omega$ is the target energy. However, we will take an unconventional route that will shield more valuable information: the spectral function of the problem.

The knowledge of the excitation spectrum of a system allows for direct comparison with experiments, such as photoemission, or neutron scattering, for instance. The numerical evaluation of dynamical correlation functions remains a very difficult task, since most computational methods are usually capable of calculating the ground-state and maybe some low energy excitations. A number of techniques have been used in the past: exact diagonalizationDagotto (1994) is limited to small clusters, quantum Monte Carlo suffers from the sign problem, and requires uncontrolled analytic continuations and the use of the max entropy approximationSchüttler and Scalapino (1986); Sandvik (1998); Silver et al. (1990); Gubernatis et al. (1991); Syljuåsen (2008); Fuchs et al. (2010); Sandvik (2016); Shao et al. (2017), and dynamical DMRGHallberg (1995); Kühner and White (1999a); Jeckelmann (2002); Nocera and Alvarez (2016) is computationally very expensive. The time-dependent density matrix renormalization group and recent variations using Chebyshev expansions have been important developments, giving access to accurate spectra for very large one-dimensional systems Daley et al. (2004); White and Feiguin (2004); Feiguin and White (2005); Feiguin (2011, 2013b); Holzner et al. (2011); Wolf et al. (2015); Xie et al. (2018). Matrix product states can also be used to propose variational forms for excited statesVanderstraeten et al. (2015a, b). Similar ideas were explored with variational Monte Carlo, that can be easily extended to higher dimensions and are free from the sign problemLi and Yang (2010); Dalla Piazza et al. (2014); Ferrari et al. (2018).

The method we introduce is derived from the so-called dynamical DMRGJeckelmann (2002) (DDMRG) and correction vector DMRGKühner and White (1999b). We plan to extract the entire dynamics of the problem by calculating the Green’s function

[TABLE]

where $A$ is some operator of interest and $z=\omega+E_{0}+i\eta$ . We derive an optimization approach based on quantum geometry concepts that will allow us to solve a large system of equations stochastically with RBMs. The method is described in great detail in sec. II and we present results for the frustrated Heisenberg chain in sec. III. We finally close with a discussion.

II Method

II.1 Variational solution

The variational wave-function $|\psi(\alpha_{1},\alpha_{2},\cdots)\rangle$ is parametrized by a number $N_{\alpha}$ of coefficients ${\vec{\alpha}}$ . In the case of an RBM, ${\vec{\alpha}}$ represents set entire set of parameters ${\vec{a},\vec{b},W}$ . The variational calculation of the ground state is carried out by minimizing the energy functional:

[TABLE]

with respect to the variational parameters ${\vec{\alpha}}$ . A serious difficulty that plagues these calculations is the optimization procedure, due to the fact that: (i) the space of configurations grows exponentially with the number of spins; (ii) the number of model parameters to be optimized increases quadratically with the system size, making calculations prone to be trapped in local metastable solutions. Since the number of configurations is exponentially large, the estimators are carried out by means of variational Monte Carlo. For this purpose, the variational energy is recast as:

[TABLE]

where the sum runs over all possible spin configurations ${\vec{\sigma}}$ and

[TABLE]

with $\psi_{\vec{\alpha}}({\vec{\sigma}})=\langle{\vec{\sigma}}|\psi({\vec{\alpha}})\rangle$ (we omit for now the $z$ superscript, since these considerations are generic and the variables ${\vec{\sigma}}$ may represent arbitrary degrees of freedom). The quantity $P({\vec{\sigma}})$ has the properties of a probability distribution, i.e, it is positive and normalized. This enables us to carry out a stochastic sampling of spin configurations according to $P$ . In practice, since $S^{z}_{Tot}$ is conserved, we generate new states by randomly picking a pair of anti-parallel spins, and accepting or rejecting the new configuration with a transition probability $w=\min{(1,P_{new}/P_{old})}$ . The expectation value of an observable such as the energy is then obtained by averaging over all the sampled configurations $\langle\hat{O}\rangle=\frac{1}{N_{c}}\sum_{n}^{N_{c}}\langle{\vec{\sigma}}_{n}|\hat{O}|{\vec{\sigma}}_{n}\rangle$ . This process can be efficiently parallelized, with many Markov chains running simultaneously on different threads.

II.2 Wave function optimization

The number of variational parameters typically grows extensively with system size as $L$ , or as $L^{2}$ , translating into a very complex energy landscape $E_{var}({\vec{\alpha}})$ with many local maxima/minima, and one global minimum that we seek. Many minimization/optimization methods can be found in the literatureHarju et al. (1997); Sorella and Capriotti (2000); Umrigar and Filippi (2005); Sorella (2005) and here we settle for the so-called Stochastic Reconfiguration (SR) Sorella (1998); Sorella and Capriotti (2000); Sorella (2005) with the optimizations proposed in Ref.Neuscamman et al., 2012. We refer the reader to a pedagogical description in Ref.Glasser et al. (2018), that we summarize and extend here for completeness and future reference using the concept of “natural gradient descent” Amari (1998) (NGD) (both concepts, SR and NGD, are equivalent).

Solving for the variational parameters using Euclidean gradient descent results in each $\alpha_{i}$ being updated iteratively as

[TABLE]

where $\tau$ is a small number (the “learning rate”). $E_{var}$ and its derivatives are estimated by sampling over states:

[TABLE]

where the operators $\hat{O}_{i}$ are formally defined as the log derivatives:

[TABLE]

By approximating the ground state as a variational wave function, we are restricting our possible wave functions to a sub-manifold of the overall Hilbert space. This sub-manifold will be in general highly non-linear and thus have varying curvature in different directions of ${\vec{\alpha}}$ . This can cause Euclidean gradient descent to have poor convergence.

In order to account for this particular geometry, we utilize natural gradient descent. The gradient of a function is dependent on the metric of its domain. The vector of partial derivatives is only the gradient for the Euclidean metric (metric tensor equal to the identity). For a non-Euclidean metric, the gradient is obtained by multiplying the inverse of the metric tensor to the vector of partial derivatives (i.e., the Euclidean gradient). The basic idea behind NGD is to carry out gradient descent with the metric corrected gradientSorella and Capriotti (2010).

Since the variational parameters $\alpha$ map to points on a sub-manifold of the Hilbert space, we use the metric imposed by this Hilbert space, which is the Fubini-Study metric Provost and Vallee (1980); Brody and Hughston (2001), with distance between wave functions $|\psi\rangle$ and $|\phi\rangle$ given by

[TABLE]

. This distance accounts for the fact that the Hilbert space is a projective space: wave functions that differ only by magnitude or an overall phase are equivalent. Solving for a distance would be an unnecessary constraint on the problem which is already constrained by the variational representation of the wave function. In differentiable form, the Fubini-Study metric is given by.

[TABLE]

Using this we can calculate the induced metric tensor on our variational parameters by equating the differentiable distances given by the metric tensor on ${\vec{\alpha}}$ and the differentiable distances in the Hilbert space of the wave functions they map to:

[TABLE]

. The solution is given by:

[TABLE]

, where $|\partial_{i}\psi\rangle=\frac{\partial}{\partial\alpha_{i}}|\psi\rangle$ Reformulating this matrix in terms of sampling over states ${\vec{\sigma}}$ results in the covariance matrix of the log deratives $O_{i}$

[TABLE]

Now, at each iteration the change in variational parameters is given by solving the system of equations

[TABLE]

The optimization procedure consists of calculating the forces $f_{i}$ in (2) and the covariance matrix $g_{ij}$ in (3) and solving (4). This is carried out iteratively until converged. In practice, we follow the accelerated SR method proposed in Ref.Neuscamman et al. (2012), where it is shown that the construction and storage of the matrix $g$ can be bypassed, translating into a remarkable speedup.

II.3 Correction vector

In order to calculate the Green’s function:

[TABLE]

where $A$ is some operator of interest and $z=E_{0}+\omega+i\eta$ we follow a procedure pioneered in the context of matrix product states, known as dynamical DMRGKühner and White (1999b); Jeckelmann (2002). It requires the calculation of the following auxiliary states:

[TABLE]

where $|\chi_{j}(z)\rangle$ is called the “correction vector”. Explicitly, $|\chi_{j}(z)\rangle$ can be obtained by solving the equation:

[TABLE]

The spectral function is defined as the imaginary part of the Green’s function, $A_{ij}(\omega)=-\frac{1}{\pi}\mathrm{Im}{G_{ij}(z)}$ , or:

[TABLE]

By Fourier transforming the spatial dependence to momentum, one obtains the entire excitation spectrum of the problem resolved in both momentum and frequency.

An alternative way to solve for $A_{ij}$ is to directly target the imaginary part:

[TABLE]

such that

[TABLE]

However, this form is typically more unstable: if $|\tilde{\chi}_{j}\rangle$ is close to an eigenstate of the Hamiltonian, the quantity in the square can become very small when approaching a pole in the spectrum. The fact that we are dealing with variational wave-functions implies that the results obtained by these two approaches will not necessarily be the same. We discuss the consequences below.

II.4 Solving for the Green’s function

As previously discussed, we need to solve the following system of equations for $|\chi_{j}(z)\rangle$

[TABLE]

where $|A_{j}\rangle=\hat{A_{j}}|\psi\rangle$ , and $|\chi_{j}(z)\rangle$ is parametrized by another set of variational parameters ${\vec{\alpha}}$ . The Green’s function can then be obtained as $G_{ij}(z)=\langle A_{i}|\chi_{j}(z)\rangle$ . We hereby introduce a similar natural gradient descend procedure to the one outlined in section II.2 to solve the generic system of equations $\hat{Q}|\chi\rangle=|A\rangle$ (we later will apply this method to the particular case with $\hat{Q}=z-\hat{H}$ ).

We solve for $|\chi\rangle$ by first minimizing the Fubini-Study metric between $\hat{Q}|\chi\rangle$ and $|A\rangle$ :

[TABLE]

As discussed above, the NGD method will yield a state $\hat{Q}|\chi\rangle$ that is parallel (or as parallel as possible), to $|A\rangle$ , but with unconstrained phase and norm. Therefore, the resulting wavefunction $|\chi\rangle$ is not quite the one we seek, but it is off by a constant $|\chi\rangle=\beta|\tilde{\chi}\rangle$ , (where $|\tilde{\chi}\rangle$ is the actual solution) that can readily be obtained.

The derivatives of $\gamma^{2}$ are given by:

[TABLE]

with

[TABLE]

The parameters ${\vec{\alpha}}$ are updated at each iteration using stochastic reconfiguration / natural gradient descent which gives

[TABLE]

where $\lambda$ is the learning rate (a small number) and the metric tensor $g$ is derived from $\hat{Q}|\chi\rangle$ (rather than $|\chi\rangle$ ) and is given by

[TABLE]

Overlaps are estimated using Monte-Carlo sampling over probability distributions $P_{0}({\vec{\sigma}})=|\langle{\vec{\sigma}}|A\rangle|^{2}/\langle A|A\rangle$ and $P_{1}({\vec{\sigma}})=|\langle{\vec{\sigma}}|Q|\chi\rangle|^{2}/\langle\chi|\hat{Q}^{\dagger}\hat{Q}|\chi\rangle$ . The two probabilities will become equivalent as the wave-function converges. However it is important to sample over both distributions to account for states that have much greater weight in one distribution than the other. For each sampled configuration ${\vec{\sigma}}$ the following quantities are calculated:

[TABLE]

Then, equations (12),(13), and (15) can be expressed in terms of sampling as

[TABLE]

Notice that $R$ becomes constant when the states become parallel and thus the sampling variance goes to zero, just as the local energies become constant when solving for the ground state.

Finally, the wave function normalization (the constant $\beta$ ) is obtained as

[TABLE]

which in terms of sampling is given as

[TABLE]

II.5 Error correction

The Green’s function can be obtained by solving either one of the two equations:

[TABLE]

or

[TABLE]

where $|A\rangle=\hat{A}|\psi\rangle$ , $\hat{Q}=E_{0}+\omega+i\eta-\hat{H}$ , and $\hat{Q}^{\dagger}=E_{0}+\omega-i\eta-\hat{H}$ . We can then solve for $G$ in three different ways

[TABLE]

As we discussed earlier, the NGD method will allow us to find a wave function as close as possible to the one we seek. However, it is possible that this wave-function does not accept a faithful representation in terms of the proposed variational form. As a consequence, regardless of the sampling error, there always will be an inherent error due to the limitations of the wave function representation. Now let $|\tilde{\chi}^{+}\rangle$ and $|\tilde{\chi}^{-}\rangle$ be the variational wave function approximations with errors $\epsilon^{+}|\phi^{+}\rangle$ and $\epsilon^{-}|\phi^{-}\rangle$ respectively so that

[TABLE]

Then we calculate $G$ all three ways, each with different but related error terms:

[TABLE]

Combining all three estimates we can get the first order error terms to cancel:

[TABLE]

However we can improve upon this by isolating another second order error term. To do this we calculate $\langle A|A\rangle$ also in three different ways.

[TABLE]

Combining the above equations we obtain:

[TABLE]

Then, in order to get our best estimate for $G$ we multiply Eq.(II.5) by $1/(i\eta\pi)$ and add it to Eq.(II.5) which results in:

[TABLE]

To understand why this makes an improvement over the estimate (II.5) we expand the wave-functions and their errors in terms of eigenstates:

[TABLE]

where $\Delta E_{n}=E_{0}+\omega-E_{n}$ . Then the error term of II.5 is

[TABLE]

Multiplying the isolated error (II.5) by $1/(i\eta\pi)$ yields:

[TABLE]

Finally, the error of Eq.(30) is then

[TABLE]

The dominant eigenstate $|n\rangle$ in these expressions is the one such that $\Delta E_{n}\approx 0$ and thus the dominant error terms should be $\epsilon^{-}_{n}$ and $\epsilon^{+}_{n}$ . But those terms are multiplied by a $\Delta E_{n}$ in the numerator and, as a result, we not only eliminate the first order errors terms, but also the part of the second order error from the most dominating contribution.

III Results

For illustration purposes we will focus on the one-dimensional spin- $\frac{1}{2}$ Heisenberg model with nearest and next nearest neighbor interactions, the so-called $J_{1}-J_{2}$ model:

[TABLE]

where $\vec{S}=(\hat{S}^{x},\hat{S}^{y},\hat{S}^{z})$ are spin operators. We consider periodic boundary conditions and chose $J_{1}$ as our unit of energy. We calculate the spin structure factor, defined as:

[TABLE]

where we have used translational invariance, since the system has periodic boundary conditions. In order to calculate the correlation function we solve the following system of equations for $|\psi(z,j)\rangle$ using the prescription described in the previous section:

[TABLE]

where $|A_{j}\rangle=\hat{S}^{z}_{j}|\psi_{0}\rangle$ . Finally, $G_{ij}(z)=\langle A_{i}|\psi(z,j)\rangle$ .

We typically carry out computations taking 20,000 measurements for each optimization step, leaving 100 iterations in between measurements to make sure they are independent and uncorrelated. We took 106 samples for the overlaps. We then solve for the wave-functions $|\chi^{\pm}\rangle$ and calculate the Green’s functions using error correction as described in the previous section.

We studied chains of length $L=30$ , larger than the largest system achievable using exact diagonalization, but still smaller than what DMRG can solve. We used 120 hidden variables in the hidden layer that translates into $\sim 3000$ variational parameters. As a benchmark, we compare our results to dynamical DMRG calculations with $m=600$ DMRG states using a broadening $\eta=0.1$ .

We first show results obtained with first order and second order error correction in Fig.1 for several representative values of momentum $k$ and $J_{2}=0$ . While the range and position of low energy poles agrees quite well, we observe a remarkable improvement upon introducing the second order correction that is particularly marked around the cusp of the peaks/poles. The range in frequency in between poles is not so accurately matched. While we observe some oscillations that we attribute to numerical errors, the main source of discrepancy s likely due to the limitations of the variational wave-function utilized. This is understood using the arguments discussed in the previous section: the second order error gets practically suppressed when the frequency corresponds to an eigenstate $\omega\sim E_{n}-E_{0}$ . It is expected that as the system size $L$ increases and the spectrum becomes continuous, the errors will be practically cancelled and the accuracy will improve over the entire range of frequencies. We next show the spectrum for $J_{2}=0$ and $J_{2}=0.2J_{1}$ in Figs.1 and 2, both in a color scale and frequency cuts for a couple of momenta. The width and the edge of the spinon continuum are very well described, as well as the magnitude of the excitation peaks.

IV Conclusions

We have presented a variational approach to calculate Green’s functions and dynamical structure factors of many-body quantum systems directly in the frequency domain using restricted Boltzmann machines. The method, inspired in dynamical DMRG and machine learning concepts, allows one to obtain the entire spectrum of excitations, which in the Heisenberg model consists of deconfined domain walls (spinons). In order to solve for the Green’s functions, we introduce a natural gradient descent method to solve complex systems of equations where the solution is encoded in RBM form. The problem is solved stochastically and can be parallelized to run different frequencies on different computing threads or nodes. Unlike the VMC method of Ferrari et al. Ferrari et al. (2018) which can provide a few hundred discrete poles, our method yields the entire spectrum with full frequency resolution. These ideas are not limited to a particular form of variational wave function and is completely general (DDMRG does it with matrix product states). In particular, we show that RBMs are not able to faithfully represent excited states but, nonetheless, we are able to reconstruct the spectral functions very accurately by introducing a regularization scheme that eliminates first and second order errors. We demonstrate the application of the technique to the frustrated case away from integrability, where our results accurately describe the position of the poles (especially low frequency ones) and the continuum. The approach can be naturally extended to higher dimensions, where both quantum Monte Carlo and the DMRG have shortcomings.

Acknowledgements.

The authors acknowledge the National Science Foundation for support under grant No. DMR-180781.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Freericks et al. (2014) J. K. Freericks, B. K. Nikolić, and O. Frieder, International Journal of Modern Physics B 28 , 1430021 (2014) , https://doi.org/10.1142/S 0217979214300217 . · doi ↗
2Biamonte et al. (2017) J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Nature 549 , 195 EP (2017) . · doi ↗
3Orús (2014) R. Orús, Annals of Physics 349 , 117 (2014) . · doi ↗
4White (1992) S. R. White, Phys. Rev. Lett. 69 , 2863 (1992).
5White (1993) S. R. White, Phys. Rev. B 48 , 10345 (1993).
6Schollwöck (2005) U. Schollwöck, Rev. Mod. Phys. 77 , 259 (2005) . · doi ↗
7Schollwöck (2011) U. Schollwöck, Annals of Physics 326 , 96 (2011) , january 2011 Special Issue. · doi ↗
8Feiguin (2013 a) A. E. Feiguin, in Strongly correlated systems: Numerical methods , edited by A. Avella and F. Mancini (Springer, Heidelberg, Berlin, 2013).