Determining system Hamiltonian from eigenstate measurements without correlation functions
Shi-Yao Hou, Ningping Cao, Sirui Lu, Yi Shen, Yiu-Tung Poon, Bei, Zeng

TL;DR
This paper introduces a novel algorithm that reconstructs local Hamiltonians solely from local eigenstate measurements, bypassing the need for nonlocal correlation data, and demonstrates its effectiveness through numerical tests.
Contribution
It presents a new method to determine local Hamiltonians using only local measurements, reformulating the problem as an optimization task and employing machine learning techniques.
Findings
Successful numerical reconstruction of random local Hamiltonians.
Local measurements suffice to recover the full Hamiltonian in generic cases.
The method avoids reliance on nonlocal correlation functions.
Abstract
Local Hamiltonians arise naturally in physical systems. Despite its seemingly `simple' local structure, exotic features such as nonlocal correlations and topological orders exhibit in eigenstates of these systems. Previous studies for recovering local Hamiltonians from measurements on an eigenstate require information of nonlocal correlation functions. In this work, we develop an algorithm to determine local Hamiltonians from only local measurements on , by reformulating the task as an unconstrained optimization problem of certain target function of Hamiltonian parameters, with only polynomial number of parameters in terms of system size. We also develop a machine learning-based-method to solve the first-order gradient used in the algorithm. Our method is tested numerically for randomly generated local Hamiltonians and returns promising reconstruction in the…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 1
Figure 2
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15| General operators | Local operators | |||||
|---|---|---|---|---|---|---|
| Number of qubits | 4 | 5 | 6 | 7 | 4 | 7 |
| Time for single trial (s) | 0.0487 | 0.0518 | 0.0539 | 0.0579 | 0.2246 | 7.938 |
| Number of trials | 32.36 | 42.34 | 45.89 | 161.6 | 54.84 | 539.4 |
| Average time cost (s) | 1.576 | 2.193 | 2.377 | 2.4735 | 12.32 | 4282 |
| Node | Function | Derivative | Type of variable(s) | Type of function value |
|---|---|---|---|---|
| number | matrix | |||
| matrix | matrix | |||
| matrix | matrix | |||
| matrix | number | |||
| matrix & number | matrix | |||
| matrix | number | |||
| number | number | |||
| matrix | number | |||
| number | number |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Determining system Hamiltonian from eigenstate measurements
without correlation functions
Shi-Yao Hou
These authors contributed equally to this work.
College of Physics and Electronic Engineering, Center for Computational Sciences, Sichuan Normal University, Chengdu 610068, China
Center for Quantum Computing, Peng Cheng Laboratory, Shenzhen, 518055, China
Ningping Cao
These authors contributed equally to this work.
Department of Mathematics & Statistics, University of Guelph, Guelph N1G 2W1, Ontario, Canada
Institute for Quantum Computing, University of Waterloo, Waterloo N2L 3G1, Ontario, Canada
Sirui Lu
Department of Physics, Tsinghua University, Beijing 100084, China
Yi Shen
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
Yiu-Tung Poon
Department of Mathematics, Iowa State University, Ames, Iowa, IA 50011, USA.
Center for Quantum Computing, Peng Cheng Laboratory, Shenzhen, 518055, China
Bei Zeng
Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Department of Mathematics & Statistics, University of Guelph, Guelph N1G 2W1, Ontario, Canada
Institute for Quantum Computing, University of Waterloo, Waterloo N2L 3G1, Ontario, Canada
(March 7, 2024)
Abstract
Local Hamiltonians arise naturally in physical systems. Despite its seemingly ‘simple’ local structure, exotic features such as nonlocal correlations and topological orders exhibit in eigenstates of these systems. Previous studies for recovering local Hamiltonians from measurements on an eigenstate require information of nonlocal correlation functions. In this work, we argue that local measurements on is enough to recover the Hamiltonian in most of the cases. Specially, we develop an algorithm to demonstrate the observation. Our algorithm is tested numerically for randomly generated local Hamiltonians of different system sizes and returns promising reconstructions with desired accuracy. Additionally, for random generated Hamiltonians (not necessarily local), our algorithm also provides precise estimations.
I Introduction
The principle of locality, arising naturally in physical systems, states that objects are only affected by their nearby surroundings. Locality is naturally embedded in numerous physical systems characterized by local Hamiltonians, which play a critical role in various quantum physics topics, such as quantum lattice models Tarasov et al. (2016); Tarasov (1985); Zanardi (2002), quantum simulation Feynman (1982); Cirac and Zoller (2012); Lloyd (1996); Buluta and Nori (2009), topological quantum computation Freedman et al. (2003), adiabatic quantum computation Nagaj (2008); Aharonov et al. (2008); Jordan et al. (2006), and quantum Hamiltonian complexity Kempe et al. (2006); Kempe and Regev (2003); Bravyi et al. (2006). In the past few years, rapidly developing machine learning techniques allow us to study these topics in a new manner Xin et al. (2018); Bairey et al. (2019); Deng et al. (2017); Lu et al. (2019); Pudenz and Lidar (2013); Liu et al. (2017). Empowered by traditional optimization methods and contemporary machine learning techniques, we map the task of revealing the information encoded in a single eigenstate of a local Hamiltonian to an optimization problem.
For a local Hamiltonian with being some local operators, it is known that a single (non-degenerate) eigenstate can encode the full Hamiltonian in certain cases Garrison and Grover (2018); Qi and Ranard (2019); Chen et al. (2012a), such as when the expectation value of s on , a_{i}=\mbox{\langle\psi|}A_{i}|\psi\rangle, are given and further assumptions are satisfied. A simple case is that when is the unique ground state of ; thus the corresponding density matrix of can be represented in the thermal form as
[TABLE]
for sufficiently large . This implies that the Hamiltonian can be directly related to , and hence can be determined by the measurement results , using algorithms developed in the literature Zhou (2008); Niekamp et al. (2013) for practical cases. Because s are local operators, the number of parameters of (i.e., the number of s) is only polynomial in terms of system size. We remark that the problem of finding is also closely related to the problem of determining quantum states from local measurements Xin et al. (2018); Bairey et al. (2019); Kalev et al. (2015); Linden et al. (2002); Linden and Wootters (2002); Diósi (2004); Chen et al. (2012b, c, 2013); Zeng et al. (2015), and also has a natural connection to the study of quantum marginal problem Klyachko (2006); Liu (2006); Liu et al. (2007); Wei et al. (2010), as well as its bosonic/fermionic version that are called the -representability problem Coleman (1963); Erdahl (1972); Klyachko (2004, 2006); Altunbulak and Klyachko (2008); Schilling et al. (2013); Walter et al. (2013); Sawicki et al. (2014).
For a wavefunction that is an eigenstate (i.e. not necessarily a ground state), one interesting situation is related to the eigenstate thermalization hypothesis (ETH) Deutsch (1991); Srednicki (1994, 1996, 1999); D’Alessio et al. (2016). When the ETH is satisfied, the reduced density matrix of a pure and finite energy density eigenstate for a small subsystem becomes equal to a thermal reduced density matrix asymptotically Garrison and Grover (2018). In other words, Eq. (1) will hold for some eigenstate of the system in this case, and one can use a similar algorithm Zhou (2008); Niekamp et al. (2013) to find from s, as in the case of ground states. Another situation previously discussed is that if the two-point correlation functions \mbox{\langle\psi|}A_{i}A_{j}|\psi\rangle are known, one can reproduce without satisfying ETH Chen et al. (2012a); Qi and Ranard (2019). Once the two-point correlation functions are known, one can again use an algorithm to recover from the correlation functions, for the case of ground states. However, in practice, the nonlocal correlation functions \mbox{\langle\psi|}A_{i}A_{j}|\psi\rangle are not easy to obtain Li et al. (2017).
In this paper, we answer a simple but significant question: can we determine a local Hamiltonian (s) from only the local information (s) of any one of the eigenstates () without further assumptions. Obviously, there are cases that an eigenstate can not determine the system Hamiltonian, such as the product state and the eigenstates of a frustration-free Hamiltonians. However, for a randomly chosen physical system of which the Hamiltonian has no special structures, we show that only the knowledge of s is sufficient to determine .
Based on the knowledge in hand - a set of possible Hermitian operators for the system Hamiltonian and the measurement outcomes , we formulate a positive-semidefinite function (Eq. 6), with the for the desired . The problem of finding the exact Hamiltonian converts to a unconstrained optimization problem of finding that minimizing . A small real number is chosen to control the precision of the result. We update the sampled until is less than . Fig. 1 depicts the procedure of our algorithm.
We test the algorithm in two scenarios: one with four randomly generated operator s acting on the whole system, and one with two random local operators s. Our algorithm almost perfectly reproduces s - the average fidelities for both cases are close to . Since the algorithm recovers Hamiltonians with almost perfect fidelities Based on the reconstructed Hamiltonian (i.e. s), one can also recover the eigenvalue of from s and s, and the wave function itself from the eigenvectors of Hamiltonian . In the case when s are local operators, our method can find s from only local measurement results, hence shed light on the correlation structures of eigenstates of local Hamiltonians.
II Algorithm
We start to discuss our method in a general situation, where the Hamiltonian can be expressed in terms of a set of known Hermitian operators : with . For an eigenstate of with unknown eigenvalue , which satisfies , we can denote the measurement results as . With only knowing the measurement results s, our goal is to find the coefficients to determine .
We observe that, even if is not the ground state of , it can be the ground state of another Hamiltonian with given by
[TABLE]
since
[TABLE]
Then the density matrix of (which is in fact of rank 1) can be written in the form of a thermal state:
[TABLE]
for sufficiently large , satisfying
[TABLE]
With these conditions in mind, we are ready to reformulate our task as an optimization problem with the following objective function:
[TABLE]
where is the estimation of , and when is minimized.
Notice that the first term of is minimized by , which guarantees that the state obtained is the state that produces the desired measurement outcomes on s. However, there may be many (thermal) states which also yield such outcomes. By simply minimizing this first term, optimization algorithms tend to return thermal states with nonzero entropy, which is not the eigenstate of (with entropy zero) that we are willing to find. In order to fix this issue and ensure that the optimization returning a (near) rank state , we add the second term, which is only zero unless is the ground state of , hence an eigenstate of . Combining these two terms together, we make sure that when the minimum value of Hamiltonian is reached, we will obtain a corresponding to measurement outcomes , and at the same time an eigenstate of . In practice, we set up a parameter , such that with , we find a result with high fidelity to the desired value of .
For the convenience of numerical implementation, we let , then the thermal state becomes
[TABLE]
Consequently, the objective function Eq. 6 can be rewritten in an equivalent form
[TABLE]
We aim to solve for by minimizing . In practice, we terminate our iterations when is smaller than a fixed small value . The corresponding optimization result is denoted as . As we reformed the objective function by using , the result is actually .
Theoretically, we need that turns to infinity to pick up the ground state of . In practice, however, we only require that is some “large number". What is more, to pick up the ground state of for , what really matters is in fact the gap between the first excited state and the ground state of . Therefore, we just simply set and let the optimizer automatically amplify the energy gap during iteration, when is approaching the desired state. A more detailed discussion regarding the choice of can be found in Appendix A.
Since all the constraints are written in Eq. 8, minimizing is an unconstrained minimization problem. There are plenty of standard algorithms for this task. In our setting, computing the second-order derivative information of is quite complicated and expensive. Therefore, instead of using Newton method which requires the Hessian matrix of the second derivatives of the objective function, we choose the quasi-Newton method with BFGS-formula to approximate the Hessians BROYDEN (1970); Fletcher (1970); Goldfarb (1970); Shanno (1970). The MATLAB function fminunc, which uses the quasi-Newton algorithm, is capable of realizing this algorithm starting from an initial random guess of s. When the quasi-Newton algorithm fails (converges into a local minimum), we start with a different set of random initial value s and optimize again until we obtain a good enough solution.
The BFGS algorithm is a typical optimization algorithm which requires gradients of the objective function on s. The form of the objective function is so complicated such that computing the gradient is a difficult task. To solve this issue, we borrow the methods of computational graph and auto differentiation from machine learning. The computational graph is shown in Fig. 10, in which we show the intermediate functions and variables. Mathematically, the final gradients could be calculated via chain rules. However, since some of the intermediate variables are matrices and complex-valued, the automatic differentiation toolboxes, which deal with real variables, can not be applied directly. To obtain the gradients, we have to careful handle the derivation of the intermediate functions, especially those with matrices as their variables. More details about the our gradient method can be found in Appendix B.
We use Matlab to implement our algorithm, and the detailed implementation can be found in Appendix C.
III Results
In this section, we test our algorithm in three steps as follows. First, we randomly generate several s and s, hence the Hamiltonian and its eigenstates. Second, for each Hamiltonian, we randomly choose one eigenstate , therefore we have and . Hereby we can run our algorithm to find the Hamiltonian . Comparing and , we then know that how well the approach works. The algorithm has been tested for two cases: being the generic operator and local operator.
To compare and , we need a measure to characterize the similarity, or distance between these two Hamiltonians. The metric we used here is the following fidelity as discussed in Fortunato et al. (2002):
[TABLE]
To see the meaning of this metric, notice that is the inner product of the two Hamiltonians, while and are the two normalization constant. Therefore, . If and describe the same system up to a constant , then . Smaller then indicates that the two Hamiltonians are more far apart. Moreover, notice that in our settings the Hamiltonians are represented by vectors and in -dimensional real space. If the chosen s are normalized and orthogonal, which means
[TABLE]
where is the normalization constant given by the system dimension (e.g. for Pauli matrices, ), this fidelity definition is exactly the cosine loss function of and , where is generated from our algorithm, that is, for normalized s,
[TABLE]
where means the 2-norm of the vector .
III.1 Results with general operators
When implementing our method on generic operators, we randomly generate three Hermitian operators s and fix them. We also create several real constants s randomly, then assemble s and s into Hamiltonians
[TABLE]
After diagonalizing , we choose one eigenvector and calculate the expectation values .
For -qubit systems, the dimension of the system is . We tested the cases for and . For each , we generate data points. Each test set is , where for and s are randomly generated Hermitian matrices of dimension 111To generate a random Hermitian matrix, we first generate complex number as the entries of a matrix , then we construct . It can be easily seen is an Hermitian matrix.. With these data, is obtained. Diagonalizing and randomly choosing one eigenvector , we obtain . We show the results for applying our algorithm to all cases in Fig. 2.
We find that the final fidelities are larger than 99.8% for all tested cases. Although the fidelity slightly decreases with the increasing number of qubits, the lowest fidelity (7-qubit case) is still higher than 99.8%. Because the eigenvector is randomly chosen, our method is not dependent on the energy level of eigenstates.
III.2 Results with local operators
In this section, we report our results on the systems with a -local interaction structure. We tested two different structures, shown in Fig. 3(a) and Fig. 3(d). Each circle represents a qubit on a lattice, and each line represents an interaction between the connected two qubits.
The fully-connected -qubit system is shown in Fig. 3(a). The Hamiltonian can be written as
[TABLE]
where s are real parameters and s are random generated -local operators. One eigenstate out of 16 is randomly chosen as the state . Our algorithm has been tested on 800 such Hamiltonians.
We then analyze the -qubit chain model shown in Fig. 3(d). Similarly, we can write the Hamiltonian as
[TABLE]
where s and s are the parameters and 2-local interactions. We randomly generate such Hamiltonians and applied our algorithm.
The results of these two -local Hamiltonians are shown in Fig. 3. Our algorithm recovered Hamiltonians with high fidelities for both cases. The average fidelity for our 4-qubit (7-qubit) system is 99.99% (99.73%). As the dimension of the system increases, the fidelity between and is slightly decreased. The histogram of the fidelities shows that, for most data points, the fidelities are very close to . Our algorithm almost perfectly recovered these -local Hamiltonian from measurement outcomes of a randomly picked eigenvector.
We examine the effectiveness of our algorithm according to different eigenvectors of the same Hamiltonians. Fig. 4 demonstrates average fidelities between and of -qubit case by chosen different eigenvectors. These average fidelities are higher than 99.9% for all eigenvectors. Hence, the effectiveness of our algorithm is independent of energy levels.
IV Further analysis of the algorithm
In this section, we analyze the error tolerance and the performance of the algorithm, based on the results of our numerical experiments.
IV.1 Error tolerance analysis
The numerical tests in previous sections deal with noiseless theoretical data. In practical scenarios, however, data is always noisy. Here we provide analysis of error tolerance for our algorithm.
As an example, we consider a 4-qubit system with Hamiltonian where ’s are random generated by Hermitian operators. Choose one eigenstate of , the noiseless measurements of the eigenstate are denoted as \{a_{i}|a_{i}=\mbox{\langle\psi|}A_{i}|\psi\rangle\}. Noises used here are randomly drawn from normal distributions . Adding the generated noise to measurements , the noisy data follows the normal distribution .
We can control how noisy is by changing the standard deviation . Clearly the noisiness of data is relative to the magnitude of true values, thus can be used as an indicator. We test ten different ’s of which ’s ranging from (1%) to (10%). For each , data points have been generated.
The Hamiltonians attained by our algorithm with these noisy data are denoted as . The fidelity between each pair of the true Hamiltonian and is shown in Fig. 5. Though increasing the noise cause the fidelities of a few points significantly decreased, all median fidelities of different are above 99.8%. Even for (added 10% of noise), only 2.1% of data points have fidelities lower than 99.0%. In other words, singular points which have significant low fidelities are rare. This is demonstrates that the performance of our algorithm is stable.
IV.2 Performance analysis
The convergence and time cost of our method are analyzed in this subsection. We study the relation between the halting condition and the convergence of our algorithm. We also observed that the time cost tends to differ for each Hamiltonian configuration.
The parameter , which is the halting condition of the algorithm, effects the convergence of our algorithm. It is the consequence of the object function has many local minima. In the schematic diagram Fig. 7, if the initial point is chosen as or , the gradient method will return the value , which is a local minimum (so as or to , and or to ).
When is greater then or , the algorithm may recognize or as the optimal solution instead of finding the true global minimum . An appropriate choice of is necessary to eliminate certain local minima.
The object function Eq. 6 used in our method has high dimension and complicated landscape. Its properties also subject to the particular class of that we work with. More analysis could be done in terms of finding an appropriate . Empirically, we choose as for all our numerical experiments. It is numerically proved to be applicable as shown in Fig. 7. The figure depicts the relation between and the output fidelity of the 3-qubit general operators case. Chosen grantees the average fidelity almost equals to 1.
By setting the halting parameter to , we can now discuss the time cost for each numerical experiment. The total time cost for optimization depends on the number of trials for each numerical example and the time of each trial. The initial values are chosen randomly, is different from case to case. We consider the average number of trials for each Hamiltonian class. Let be the duration of a single trial which is mainly depending on the Hamiltonian configuration and number of qubits. The total average time could be estimated as
[TABLE]
We test our algorithm on a workstation with Intel i7-8700K and 32 GB RAM, the results are listed in Table 1. The table demonstrates that the time cost does not grow rapidly for the general operator cases, while it changes dramatically with the system size for the local operator cases. The of 7-qubit general operator instance is almost 2 times more then of the 4-qubit general operator case. On the contrast, the average time cost of 7-qubit lattice is almost 350 times more then the of 4-qubit lattice.
V Comparison with the correlation matrix method
In Ref. Qi and Ranard (2019), a method is proposed to recover Hamiltonians from correlation functions. Their method works as follows: with an eigenstate , the Hamiltonian of the system is defined as
[TABLE]
where s are normalized and orthogonal Hermitian matrices (e.g. the Pauli matrices). They defined a correlation matrix, of which the matrix elements are
[TABLE]
where is the anti-commutator. Then diagonalize matrix and find the eigenvector corresponding to the eigenvalue [math]. The Hamiltonian
[TABLE]
is the desired Hamiltonian of the system . We refer to this algorithm as the correlation matrix method (CMM).
We conduct numerical experiments to compare the performance of CMM and our method. We calculate such 4-qubit Hamiltonians
[TABLE]
where is the th Pauli matrix acting on the th qubit. It turns out that CMM and our algorithm both render good estimations. The results of CMM possess greater accuracy–the error rate, defined as , is less than . Error rates of our results range from to .
CMM is deterministic, more accurate as well as faster than our method. It only involves diagonalization of matrix , of which the dimension is polynomial of the number of qubits for local systems. CMM also provides a criteria to determine whether the eigenstate is uniquely determine the Hamiltonian. That is, if only has one eigenvalue equals to [math], then the Hamiltonian is uniquely determined. The advantage of CMM arises from more restrictions and more information required for applying it: (1) All s in CMM are orthogonal, while in our tests the corresponding s are randomly generated; (2) CMM requires non-local correlation functions. The matrix element includes the term
[TABLE]
where . Although s may be local, for all s and s could be global. Therefore, as indicated in Qi and Ranard (2019), if only partial knowledge of the system is available, the question that whether the Hamiltonian can be reconstructed remains unclear.
Another problem of applying CMM is similar to the problem of the halting parameter in our method. The recovery depends on the existence of one eigenvalue equals to [math]. However, the equivalence of tow numbers in a computer is different from in theory. Even when working with noiseless theoretical data, the finite length data storage (e.g. the precision of double float-point format is ) and data processing can introduce certain errors, not to mention the noisy data from real quantum devices. Namely, determine whether a number is equal to [math] is up to a certain precision. Therefore, one needs to set a threshold to determine equivalence before applying CMM. How to appropriately chose a is empirical and case dependent. From this perspective, both methods are not similarly complicated.
We are also willing to check the consistency of CMM and our methods. Lacking a systemic way of choosing the threshold , we need to seek another way to bridge these two method. From the numerical experiments, we find in CMM that the lowest absolute eigenvalue (denotes as ) of is about and the second lowest absolute eigenvalue (denotes as ) ranges from to . If is too small, it would be hard to tell whether there exists one or more 0 eigenvalues, that is, the farther and are, the more accurate the CMM’s outcome is. It will be an evidence of consistency if we can witness the same tendency from our method. We, therefore, define the condition number as the ratio of the second lowest absolute eigenvalue to the lowest absolute eigenvalue (). We use the estimations provided by our method to construct the correlation matrix , obtain the and from , then calculate the condition number . The results is shown in Fig. 8. We find, generally, the larger the condition number, the better our algorithm performs (higher fidelity). It is consistent with the behaviour of CMM.
In summary, for all the tests performed, CMM and our method has similar behaviour in terms of the condition number, which shows that both method return the desired results (for reconstruction the desired Hamiltonian with high fidelity). Regarding to algorithm performance, when all sorts of correlations are available, the CMM is numerically more efficient and accurate. In contrast, however, our algorithm use less information, therefore can be used in a much wider situations, especially when certain global correlation information is hard to obtain in practice.
VI Discussion
In this work, we discuss the problem of reconstructing system Hamiltonian using only measurement data a_{i}=\mbox{\langle\psi|}A_{i}|\psi\rangle of one eigenstate of by reformulating the task as an unconstrained optimization problem of some target function of s. We numerically tested our method for both randomly generated s and also the case that s are random local operators. In both cases, we obtain good fidelities of the reconstructed . Our results are somewhat surprising: only local measurements on one eigenstate are enough to determine the system Hamiltonian for generic cases, no further assumption is needed. Though theoretically it is beautiful, our algorithm is not scalable since the calculations of exponentiation and gradient of matrices become expensive when the system size gets large. Improvements can be done by more efficiently approximate them.
We also remark that, in the sense that our method almost perfectly recovered the Hamiltonian , the information encoded in the Hamiltonian , such as the eigenstate itself (though described exponentially many parameters in system size) can also in principle be revealed. This builds a bridge from our study to quantum tomography and other related topics of local Hamiltonians. Empowered by traditional optimization methods and machine learning techniques, our algorithm could be applied in various quantum physics problems, such as quantum simulation, quantum lattice models, adiabatic quantum computation, etc.
Our discussion also raises many new questions. For instance, a straightforward one is whether other methods can be used for the reconstruction problem, and their efficiency and stability compared to the method we have used in this work. One may also wonder how the information of “being an eigenstate" helps to determine a quantum state locally, which is generically only the case for the ground state, and how this information could be related to help quantum state tomography in a more general setting.
Aknowlegement
We thank Feihao Zhang, Lei Wang, Shilin Huang, and Dawei Lu for helpful discussions. S.-Y.H is supported by National Natural Science Foundation of China under Grant No. 11847154. N.C. and B.Z. are supported by NSERC and CIFAR.
Appendix A The value of
It is easy to observe that and have the same eigenstates. The constant only contributes a constant factor to the magnitude of eigenvalues, i.e., the eigenenergies of the given system. Theoretically, a thermal state tends to the ground state of the given Hamiltonian if and only if the temperature is zero (or, for a numerical algorithm, close to zero), which means goes to infinite. Numerically, the only needs to be a sufficiently large positive number.
Let us denote the -th eigenvalue of as and the energy gaps as . From the definition of , the ground energy is always 0. As we observed, during the optimization process, the eigenenergies, as well as the energy gaps of the Hamiltonian, gets larger. From Fig. 9(a), we can see the energy gaps grow as the optimization goes. The corresponding is a finite sufficient large number.
On the other hand, we can also examine the probability of the ground state of as the iteration goes. This probability can be expressed as
[TABLE]
Since the ground state energy of is always have zero, the probability is
[TABLE]
The change of the probability with iterations is shown in Fig. 9(b), from which we can see that finally, the probability goes to , which means that the thermal state form is (almost) ground state when is a relatively large positive constant.
Appendix B Calculation of the gradient of
B.1 General method
The computational graph of this technique is shown in Fig. 10. Each node of the graph represents an intermediate function. In principle, the chain rule can be applied. It seems that we can compute the gradient by using the existing deep learning frameworks. However, those auto differentiation tools embedded in the frameworks such as PyTorch Paszke et al. (2017) and TensorFlow Abadi et al. (2015) cannot harness our problem due to the following two reasons: first, most of the tools only deal with real numbers while in quantum physics we often deal with complex numbers; second and the most important, some of the intermediate functions in the computational graph use matrices (or in physics, operators) as variables, while in neural networks, the variables are real numbers. One should be careful while deriving matrices since a matrix usually does not commute with its derivative. For instance, even for some simple conditions, the node , the derivative can not be simply considered as because . Instead, .
The derivatives in the computational graph shown in Fig. 10 are listed in Table 2. Here, due to the reason mentionedabove, we use forward propagation to calculate the gradients. In Table 2, the derivative could be separated into the following categories: 1. The variable(s) of the function is a number. This corresponds to the simplest case, and the derivatives could be obtained simply using chain rules; 2. The variables are matrices, but the function is the trace function tr. The derivatives could be obtained by applying ; 3. Some elementary functions take matrices as variables, i.e., . In this case, one should be careful about the commutators when dealing with it; 4. , of which the gradients are difficult to calculate, which will be discussed in detail.
B.2 Derivative of matrix exponentials
The derivative of function with respect to can be written as
[TABLE]
where, in our cases, . Generally, the commutator . Therefore, the integration Eq. 21 can not be simply calculated. In Ref. Niekamp et al. (2013), the authors approximate the integration using the value of the upper and lower limits. This approach does not work for our case. Thus we introduce a new way to calculate it. Let
[TABLE]
[TABLE]
and
[TABLE]
Note that . It can be derived that
[TABLE]
Let
[TABLE]
The derivative equation (25) can be solved as
[TABLE]
where is a matrix with all entries 0 and is the identity matrix. With , we can obtain as well as the integration Eq. 21. This completes the gradient calculation for our algorithm.
Appendix C Software implementations
According to the computational graph Fig. 10 and the Table 2, we can define a function which accepts , the operators , and returns the value of and the gradient :
function [f,g]=fun(c, A)
% f is the value of the objective function
% g is the gradient
The function fminunc in MATLAB/Octave can accept the function and the gradient and doing the optimization:
f_handle=@(c)fun(c,A)
The default algorithm is BFGS. To make the algorithm using gradient we provide:
options=optimoptions(’fminunc’,’SpecifyObjectiveGradient’,’true’);
Then, we can start the optimization
⬇ [c1, fval]=fminunc(f_handle,c0)
Here, c1 is the final points, fval is the final value of the objective function and c0 is the initial value.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Tarasov et al. (2016) V. O. Tarasov, L. A. Takhtadzhyan, and L. D. Faddeev, in Fifty Years of Mathematical Physics: Selected Works of Ludwig Faddeev (World Scientific, 2016) pp. 339–353.
- 2Tarasov (1985) V. O. Tarasov, Theoretical and Mathematical Physics 63 , 440 (1985).
- 3Zanardi (2002) P. Zanardi, Physical Review A 65 , 042101 (2002).
- 4Feynman (1982) R. P. Feynman, International journal of theoretical physics 21 , 467 (1982).
- 5Cirac and Zoller (2012) J. I. Cirac and P. Zoller, Nature Physics 8 , 264 (2012).
- 6Lloyd (1996) S. Lloyd, Science , 1073 (1996).
- 7Buluta and Nori (2009) I. Buluta and F. Nori, Science 326 , 108 (2009).
- 8Freedman et al. (2003) M. Freedman, A. Kitaev, M. Larsen, and Z. Wang, Bulletin of the American Mathematical Society 40 , 31 (2003).
