Ensemble Quasi-Newton HMC
Xiao-Yong Jin, James C. Osborn

TL;DR
This paper introduces an enhanced Hybrid Monte Carlo algorithm that incorporates an approximate inverse Hessian to improve sampling efficiency in lattice gauge theories, demonstrated on 2D U(1) models.
Contribution
It proposes a novel method to exchange information within Markov chain ensembles and integrates a quasi-Newton inspired Hessian into HMC to mitigate critical slowing down.
Findings
Improved sampling efficiency in 2D U(1) gauge theory
Effective exchange of information within Markov chain ensembles
Potential for application to more complex gauge theories
Abstract
We present a modification of the Hybrid Monte Carlo algorithm for tackling the critical slowing down of generating Markov chains of lattice gauge configurations towards the continuum limit. We propose a new method to exchange information within an ensemble of Markov chains, and use it to construct an approximate inverse Hessian matrix of the action inspired from quasi-Newton algorithms for optimization. The kinetic term of the molecular dynamics evolution includes the approximate Hessian for long distance couplings among the momenta. We show the result of applying the new algorithm to the gauge theory in two dimensions, and discuss our future plans.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Protein Structure and Dynamics · Stochastic processes and statistical mechanics
Ensemble Quasi-Newton HMC
and James C. Osborn
Computational Science Division
Argonne National Laboratory
9700 S. Cass Ave.
Lemont, IL 60439, USA
Abstract:
We present a modification of the Hybrid Monte Carlo algorithm for tackling the critical slowing down of generating Markov chains of lattice gauge configurations towards the continuum limit. We propose a new method to exchange information within an ensemble of Markov chains, and use it to construct an approximate inverse Hessian matrix of the action inspired from quasi-Newton algorithms for optimization. The kinetic term of the molecular dynamics evolution includes the approximate Hessian for long distance couplings among the momenta. We show the result of applying the new algorithm to the gauge theory in two dimensions, and discuss our future plans.
1 Introduction
In generating a Markov chain, we aim at speeding up Monte Carlo simulations, making proposal configurations far from the current configuration in phase space, with relative low cost. Molecular Dynamics (MD) evolution in fictitious time using random momenta naturally extends and mitigates the Langevin-like random walk behavior. This Hybrid Monte Carlo (HMC) algorithm [1] works well in high dimensional systems, such as lattice QCD. Approaching the continuum limit of the lattice theory, some physical modes in MD slows down exponentially, leading to research in Fourier acceleration [2, 3, 4] as a possible remedy. The analogous Riemannian manifold HMC [5] claims success for some probability density functions in guiding the MD evolution through the phase space. Recent efforts [6, 7, 8] surge in analyzing and applying similar techniques to lattice QCD. We focus on employing, as the acceleration kernel, a numerically cheaper approximation of the Hessian matrix from the Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm [9, 10] (L-BFGS), a common quasi-Newton optimization method. This approximation applies to not only the gauge fields but also the pseudo-fermion fields. The HMC, nevertheless, requires changes to adopt such approximation that uses information from multiple configurations.
In this paper we present the ensemble quasi-Newton HMC (QNHMC) method, discuss the characteristics of the method on two-dimensional lattice gauge theory, and show preliminary results of its effect on the autocorrelation of the average plaquette value and topological charge.
2 Markov chain for assisted MD evolution
The MD evolution in the heart of the HMC algorithm follows from Hamiltonian dynamics,
[TABLE]
where is the action, the fictitious momenta, and a fixed MD mass matrix. A symplectic and reversible discrete integrator advances the state of the Markov chain from to over a fictitious time period, , the trajectory length. Using the Hamiltonian as the negative log probability of the enlarged phase space including and , the correctness of the HMC demands a positive definite .
The choice of affects the performance of HMC. For a general action, a MD mass matrix containing the local information of the Riemann curvature can bring considerable speedups [5] in the efficiency of Markov Chain Monte Carlo methods. The article recommends the Fisher information matrix as . Fourier acceleration [3, 4] suggests the field Laplacian operator as . We are interested in using a fixed during one MD trajectory, for its simplicity and efficiency. Any explicit symplectic reversible integrator for equation (1) would still be applicable. However, this also means that cannot depend on any configurations from this one whole MD trajectory.
In general we want a proposal of for the next state of the Markov chain from following a symplectic and reversible discretization of the MD evolution (1) where comes from our choice of a function, ,
[TABLE]
whose argument is a list of configurations , which are all different from , , or any configuration along the discretized path of this particular MD evolution. Assuming a particular choice of could help through out the Markov chain generation, we can fix and , optimally picked from available configurations, and the HMC procedure remains the same except for the additional mass matrix .
In this paper, we focus on building that is fixed during one MD evolution but changes after each trajectory. We use from a set of parallel streams of Markov chains. We update one of the streams using information from the others, suggested in references [11, 12]. We can have multiple ways to generate such Markov chains, and obtain of from neighboring streams. The following is the base case with provable reversibility. We use an arbitrary information exchange kernel to generalize the ensemble assisted Markov chain.
Let be the number of coupled parallel streams, each labeled , for to . Let be a function on a unordered set of configurations, , with . Let be a symplectic and reversible mapping that generates the next state of one Markov chain, from to \mathbb{X}^{\prime}_{j}=\mathcal{U}\left(\mathcal{F}\big{(}\{\mathcal{X}_{i}\}_{i=0}^{\mathcal{N}-1}\big{)}\right)\mathbb{X}_{j}. Given a fixed set of , depends on the value of , and satisfies the detailed balance, , where is probability density we want to simulate and the transition probability. We give the definition of within the steps of the ensemble assisted Markov chain described in the following.
When updating for each from [math] to :
- (a)
Setting the list, from the list , with . 2. (b)
Evolve according to \mathcal{U}\left(\mathcal{F}\big{(}\{\mathcal{X}_{i}\}_{i=0}^{\mathcal{N}-1}\big{)}\right). 2. 2.
After generating one trajectory for each parallel streams, Set the new to the reversed sequence of it, . This is for the purpose of reversibility.
Figure 1 illustrates an example of one update with streams. The current configurations from these four streams are labeled with [math], , , and . We update those successively, each time with information obtained from the function applied to configurations from other streams. After updating each stream to , , , and , we reverse the ordering of those, and complete this one update. We can see that when we reverse the Markov chain, we would update first with the information from , because does not depend on the ordering of , we can reproduce the reversed Markov chain.
For the purpose of assisting MD evolution, where represents the procedure of refreshing , integrating the equation (1), and finishing with a Metropolis-Hastings accept/reject step, we use
[TABLE]
where a special routine sort the set of before applying the function , because our choice of in equation (2) depends on the ordering of , and sorting guarantees the same with a reversed procedure. In addition to the simple case presented here, we can build more involved Markov chains by using more states per parallel stream for the exchange kernel , or by decoupling some of the parallel streams for parallel evolution.
3 L-BFGS approximated Hessian
Among coupled parallel HMC streams, for each stream, we use the latest configurations from the other streams to construct the approximate Hessian with the L-BFGS algorithm. For these configurations, we compute the site-wise finite differences of lattice gauge fields, ,
[TABLE]
where and are fields of the elements of the Lie algebra, is the length of the L-BFGS memory, and the inequality comes from selecting only those field pairs with (the inner product implicitly traces over the color indices and sums over lattice) for the positive definiteness of the approximated Hessian. As required in equation (3) we sort the field pairs according to .
The L-BFGS algorithm gives the inverse Hessian as a recursively defined operator,
[TABLE]
where as an initial value comes from the diagonal term of the Hessian matrix of the 2-D action in the weak coupling limit. The associated L-BFGS Hessian matrix can be expressed as,
[TABLE]
This rank-2 update has a symmetric product form [13], showing the explicit positive definiteness,
[TABLE]
In our test with the unmodified L-BFGS algorithm, the low eigenvalues of the approximated Hessian matrix decreases rapidly as the L-BFGS memory length increases, even after removing all the exact zero modes from the theory described in the next section. While the largest eigenvalues are stable, the condition number increases and the Hessian matrix becomes singular with a modest L-BFGS memory length, because the approximate action surface spanned by the samples we draw for the L-BFGS algorithm has zero modes and may even be concave. Since the MD evolution involves the inverse of the Hessian matrix, the evolution becomes unstable with near zero modes. A straightforward method to regulate the approximated Hessian would be to add a small term to the diagonal of . It nevertheless breaks the rank-2 update iteration, invalidates the simple inversion formula and the symmetric decomposition for the square root of . This would require a conjugate gradient inversion and a rational approximation of the square root of .
Investigating the determinant behavior from the symmetric product form (7) leads us to one solution: adding a small term to one of the rank-1 updates in equation (6),
[TABLE]
This still invalidates the iteration formula (5). The symmetric product form (7), however, remains applicable with minimal changes. The complexity of iterating the symmetric product form is always linear in lattice volume and, in terms of , in space and in time.
4 gauge theory on a 2-D lattice
We use the Wilson plaquette action for the gauge theory on a two-dimensional lattice with periodic boundary conditions. As the QNHMC algorithm uses approximated Hessian, we first need to remove all the exact zero modes of the Hessian from the theory, in order to improve the stability of the MD evolution.
The gauge degrees of freedom form the exact zero modes of the Hessian. With periodic boundary conditions, that is zero modes for a lattice with spatial and temporal extent and . We fix the gauge using a maximal tree of links which we set the gauge variables to unity. The maximal tree includes lattice sites and directions satisfying
[TABLE]
There are two global gauge degrees of freedom, due to the abelian nature of the theory,
[TABLE]
where and are elements of the group. Thus we fix two more gauge links, , during the MD evolution of QNHMC to remove these two zero modes.
We are interested in observables that are slow to evolve in the Markov chain, particularly the topological charge. We use the definition of the topological charge [14, 15], , where the complex argument takes the principle value of . This definition does not apply to exceptional configurations (with no contribution to the partition function in the continuum limit) where for some . On a two dimensional lattice with periodic boundary conditions, this definition of topological charge gives exact integer values.
5 Current status, and future plans
We implement the gauge theory in the QEX framework [16]. We use PRIMME [17] to study the eigenmodes of the exact Hessian matrix and the L-BFGS approximated one.
The results below come from the 2-D theory at on a lattice of size , with the number of coupled parallel HMC streams, and , using the number of configurations from 8192 to 65536, depending on the trajectory length. The MD evolution uses the Omelyan’s second order minimum norm integrator [18]. We keep the number of steps per MD trajectory fixed at 8, 16, 32, or 64, while tuning for optimal trajectory length separately for conventional HMC and QNHMC.
Figure 2 shows the integrated autocorrelation length of topologic charge squared (left) and the average plaquette (right). To include the cost of generating the Markov chain, we multiply the integrated autocorrelation length by the number of MD steps in an HMC trajectory, converting the correlation length from the unit of configuration to the unit of MD steps. We refer to this quantity as the cost of generating configurations for uncorrelated measurable quantities, in terms of the force evaluations, and we tune simulation parameters to lower the cost. The apparent increase of the cost for the average plaquette after the trajectory length grows longer than three is due to the fact that the autocorrelation becomes minimal between successive configurations and the cost here becomes linearly proportional to the MD steps.
Comparing the conventional HMC with and without gauge fixing, we see that the topological quantity shows about a factor of two increased cost, going from no gauge fixing to gauge fixing. The autocorrelation of the average plaquette value however depends less on the gauge fixing. Using QNHMC with shows no improvement for the topological quantity, and the cost worsens with . On the other hand, QNHMC reduces the cost for the average plaquette with , and more so with increased number of coupled Markov chains, from to .
Moving forward, we will do more tuning and testing with the QNHMC algorithm, studying the scaling behavior toward the continuum limit. On the other hand, we will also look for other approaches to approximate the Hessian. L-BFGS is designed for its efficiency in iterative optimizations. Since we have an ensemble of Markov chains, we will look for other ways to approximate the Hessian matrix [19, 20].
Acknowledgments.
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Duane, A. Kennedy, B. Pendleton and D. Roweth, Hybrid Monte Carlo , Phys.Lett. B 195 (1987) 216 . · doi ↗
- 2[2] G. G. Batrouni, G. R. Katz, A. S. Kronfeld, G. P. Lepage, B. Svetitsky and K. G. Wilson, Langevin Simulations of Lattice Field Theories , Phys. Rev. D 32 (1985) 2736 . · doi ↗
- 3[3] S. Duane, R. Kenway, B. J. Pendleton and D. Roweth, Acceleration of Gauge Field Dynamics , Phys. Lett. B 176 (1986) 143 . · doi ↗
- 4[4] S. Duane and B. J. Pendleton, Gauge Invariant Fourier Acceleration , Phys. Lett. B 206 (1988) 101 . · doi ↗
- 5[5] M. Girolami and B. Calderhead, Riemann manifold langevin and hamiltonian monte carlo methods , Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2011) 123 . · doi ↗
- 6[6] G. Cossu, P. Boyle, N. Christ, C. Jung, A. Jüttner and F. Sanfilippo, Testing algorithms for critical slowing down , EPJ Web Conf. 175 (2018) 02008 [ 1710.07036 ]. · doi ↗
- 7[7] N. H. Christ and E. W. Wickenden, Fourier acceleration, the HMC algorithm and renormalizability , Po S LATTICE 2018 (2018) 025 [ 1812.05281 ].
- 8[8] Y. Zhao, Numerical Implementation of Gauge-Fixed Fourier Acceleration , Po S LATTICE 2018 (2018) 026 [ 1812.05790 ].
