Unfolding Hidden Barriers by Active Enhanced Sampling

Jing Zhang; Ming Chen

arXiv:1705.07414·physics.chem-ph·July 11, 2018

Unfolding Hidden Barriers by Active Enhanced Sampling

Jing Zhang, Ming Chen

PDF

TL;DR

This paper introduces an active learning enhanced sampling method that uses deep neural networks to iteratively identify and lift hidden barriers in complex systems, improving sampling efficiency.

Contribution

It presents a novel active learning scheme combining neural network-based CVs with enhanced sampling to lift degeneracies and explore energy landscapes more effectively.

Findings

01

Successfully lifts hidden barriers in energy landscapes.

02

Improves sampling efficiency and completeness.

03

Preserves kinetic characteristics during exploration.

Abstract

Collective variable (CV) or order parameter based enhanced sampling algorithms have achieved great success due to their ability to efficiently explore the rough potential energy landscapes of complex systems. However, the degeneracy of microscopic configurations, originating from the orthogonal space perpendicular to the CVs, is likely to shadow "hidden barriers" and greatly reduce the efficiency of CV-based sampling. Here we demonstrate that systematic machine learning CV, through enhanced sampling, can iteratively lift such degeneracies on the fly. We introduce an active learning scheme that consists of a parametric CV learner based on deep neural network and a CV-based enhanced sampler. Our active enhanced sampling (AES) algorithm is capable of identifying the least informative regions based on a historical sample, forming a positive feedback loop between the CV learner and sampler.…

Equations2

C = i \sum (j \sum M_{i, j}^{hi g h} lo g \frac{M _{i, j}^{hi g h}}{M _{i, j}^{l o w}}) .

C = i \sum (j \sum M_{i, j}^{hi g h} lo g \frac{M _{i, j}^{hi g h}}{M _{i, j}^{l o w}}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Unfolding Hidden Barriers by Active Enhanced Sampling

Jing Zhang

[email protected]

KLA-Tencor, AI Division

Ming Chen

[email protected]

Department of Chemistry, University of California, Berkeley

Abstract

Collective variable (CV) or order parameter based enhanced sampling algorithms have achieved great success due to their ability to efficiently explore the rough potential energy landscapes of complex systems. However, the degeneracy of microscopic configurations, originating from the orthogonal space perpendicular to the CVs, is likely to shadow “hidden barriers” and greatly reduce the efficiency of CV-based sampling. Here we demonstrate that systematic machine learning CV, through enhanced sampling, can iteratively lift such degeneracies on the fly. We introduce an active learning scheme that consists of a parametric CV learner based on deep neural network and a CV-based enhanced sampler. Our active enhanced sampling (AES) algorithm is capable of identifying the least informative regions based on a historical sample, forming a positive feedback loop between the CV learner and sampler. This approach is able to globally preserve kinetic characteristics by incrementally enhancing both sample completeness and CV quality.

enhanced sampling $|$ collective variable $|$ active learning $|$ nonlinear dimensionality reduction $|$ neural network

pacs:

Valid PACS appear here

††preprint: jcp_v01

Molecular dynamics (MD) simulations are an essential tool to understand the equilibria and kinetics of complex systems and processes, such as protein foldingRizzuti and Daggett (2013), drug binding De Vivo et al. (2016), phase transitions Sosso et al. (2016), glass states Massobrio et al. (2015); Berthier and Biroli (2011), etc. Sampling equilibrium states and conformational changes requires the exploration of a “rough” high-dimensional potential energy surface (PES), on which stable configurations are separated by relatively high barriers. This leads to an exponential growth of equilibration time in a MD simulation. To avoid trapping in local minima, various enhanced sampling methods have been proposed to improve the sampling efficiency Sugita and Okamoto (1999); Laio and Parrinello (2002); Barducci et al. (2008); Maragliano and Vanden-Eijnden (2006); Abrams and Tuckerman (2008); Darve et al. (2008). One family of these methods including umbrella sampling Torrie and Valleau (1977), metadynamics Laio and Parrinello (2002), temperature accelerated MD Maragliano and Vanden-Eijnden (2006), etc., forces the exploration of low-probability states via a biasing of the probability distribution of select degrees of freedom (DOF). Such DOFs are referred to as collective variables (CVs), which coarse-grain the high dimensional PES to a low dimensional free energy surface (FES).

An ideal set of CVs should retain the kinetic characteristics of the system Du et al. (1998); Geissler et al. (1999); E et al. (2005) on the FES, which requires that the CVs precisely describe the low free energy regions, especially critical transition paths between minima E et al. (2005). Determining a small number of CVs to globally preserve kinetic information is quite challenging, due to the non-uniform intrinsic dimensionalities Rohrdanz et al. (2011) and non-linear local structures of these regions Hegger et al. (2007). One natural approach for CV selection, which has achieved some successes Nymeyer et al. (2000); Pietrucci and Laio (2009); Yu and Tuckerman (2011), seeks to empirically construct CVs based on physical intuition and structure characteristics. Other efforts have been focused on determining or training CVs through dimension reduction on simulation data Van Aalten et al. (1997); Das et al. (2006); Ceriotti et al. (2011); Coifman et al. (2008); Tiwary and Berne (2016); Pérez-Hernández et al. (2013). The resulting CVs from both approaches are often kept static throughout the entire enhanced sampling process.

For complex chemical systems, the static form of the CVs usually leads to problematic degeneracies. In the space orthogonal to the CVs Zheng et al. (2008), potential energy barriers, a.k.a. “hidden barriers”, can separate important stable configurations. The transitions over hidden barriers that are shadowed by the chosen CVs are not observable on the FES. This phenomenon is called “orthogonal space degeneracy”. When exploring the CV space, enhanced sampling algorithms only enhance the sampling of barrier crossing on the FES, while leaving transitions over hidden barriers unaffected. Therefore, enhanced sampling algorithms rely on CV selection methods to provide a set of less-degenerate CVs. Theoretically, the set of less-degenerate CVs can be constructed given either a prior understanding of the system Pietrucci and Laio (2009); Steinhardt et al. (1983); Giberti et al. (2013) or a complete sampling of the system Das et al. (2006); Ceriotti et al. (2011); Coifman et al. (2008); Tiwary and Berne (2016); Pérez-Hernández et al. (2013). Yet in practice, it is very difficult to obtain this information in a finite amount of simulation time. Hence, to break degeneracy in orthogonal space, it is vital to establish a systematic and on-the-fly approach to CV construction for enhanced sampling algorithms.

Before explaining the methodology of AES, it is worthwhile to illustrate orthogonal space degeneracy by example. For this purpose the alanine dipeptide molecule was selected. As shown in Fig. 1, two Ramachandran dihedral angles $(\Phi,\Psi)$ are usually considered as proper CVs to map all stable configurations Rohrdanz et al. (2011) of the alanine dipeptide. Part (a) shows three major basins ( $C7_{eq}$ , $C5$ and $C7_{ax}$ ) on the FES of $\Phi$ and $\Psi$ . As a comparison, only two minima were located when the radius of gyration (Rg) and number of hydrogen bond (NH) (commonly employed when mapping biomolecule conformations Abrams and Tuckerman (2008); Barducci et al. (2013); Yu et al. (2016), see SI) were selected as CVs, see Fig. 1(b). It is clear that $C7_{eq}$ and $C7_{ax}$ are degenerate with similar Rg and NH values, which greatly reduces the sampling efficiency of $C7_{ax}$ , as illustrated in Fig. 1(e). Denoting the $C7_{ax}$ basin as $A$ , we estimate the sampling efficiency via a normalized auto-correlation function $C(t)=\langle\mathbbm{1}_{A}(\mathbf{x}(0))\mathbbm{1}_{A}(\mathbf{x}(t))\rangle/\langle\mathbbm{1}_{A}^{2}(\mathbf{x}(0))\rangle$ , where $A=\{\mathbf{x}:17.2^{\circ}\leq\Phi(\mathbf{x})\leq 126.1^{\circ}\}$ and $\mathbf{x}$ are system Cartesian coordinates. $C(t)$ measures the possibility of finding the system, with initial C7ax configuration, staying in $A$ at time $t$ . Faster decay of $C(t)$ suggests a shorter average time for the system to escape from $C7_{ax}$ , and vice versa. Fig. 1(e) clearly shows that well-tempered metadynamics (WTM) Barducci et al. (2008) with Rg and NH produces a very slow decay of $C(t)$ , indicating that transitions between $C7_{ax}$ and $C7_{eq}$ / $C5$ are not enhanced due to degeneracy while applying $\Phi$ and $\Psi$ removes the degeneracy. Thus $C(t)$ decays much faster. Unlike alanine dipeptide, in a more general scenario, constructing a small working set of such CVs is almost impossible without a thorough knowledge of the system.

Instead of well chosen CVs from prior knowledge, this work proposes Active Enhanced Sampling (AES) as a solution, which can start with arbitrary CVs and iteratively improve CV quality via active learning. Active learning is a semi-supervised learning algorithm to conductively query samples or desired outputs from the current least informative regions (CLIRs) as new learning samples. Similarly, AES introduces Stochastic Kinetic Embedding (StKE) to generate the low dimensional CV representation that preserves kinetic information and determines the CLIRs including degenerate states. Such CV representation (created by StKE) along with a FES sampler guides the MD simulation to explore these CLIRs more efficiently; on the other hand, the configurations generated by the sampler improves the learning of StKE.

The formalism of StKE is described as below. Assuming Cartesian coordinates of the system are $\mathbf{x}$ , $N_{h}$ generalized coordinates are denoted as $\mathbf{q}^{h}(\mathbf{x})$ and selected to characterize all slow modes. CVs are defined as $N_{l}<N_{h}$ functions of $\mathbf{q}^{h}$ , i.e. $\{\mathbf{q}^{l}(\mathbf{q}^{h}(\mathbf{x}))\}$ . Given a set of samples of generalized coordinates $\mathbf{s}^{h}_{n}=\mathbf{q}^{h}(\mathbf{x}_{n})\in\mathbb{R}^{N_{h}}$ , $n=1,2,...,N_{s}$ with $N_{s}$ as the total number of samples, and the associated Boltzmann probability $p(\mathbf{s}^{h}_{n})$ , StKE determines a projection of $f:\mathbf{s}_{n}^{h}\in\mathbb{R}^{N_{h}}\rightarrow\mathbf{s}_{n}^{l}\in\mathbb{R}^{N_{l}}$ , such that the diffusion distance Coifman et al. (2008) between each pair of datapoints is optimally retained. StKE assumes the samples are generated from an implicit diffusion process. A Markov chain is defined on $\{\mathbf{s}^{h}_{n}\}$ with an unnormalized transition matrix as $L(\mathbf{s}^{h}_{i},\mathbf{s}^{h}_{j})=K(\mathbf{s}^{h}_{i},\mathbf{s}^{h}_{j})/\sqrt{p(\mathbf{s}^{h}_{i})p(\mathbf{s}^{h}_{j})}$ where $K(\mathbf{s}^{h}_{i},\mathbf{s}^{h}_{j})$ is a Gaussian kernel describing the Brownian motion transition probability from datapoint $\mathbf{s}^{h}_{i}$ to $\mathbf{s}^{h}_{j}$ within a finite time step and $p(\mathbf{s}^{h}_{i})$ is estimated via kernel density estimator. For unbiased samples $\{\mathbf{s}^{h}\}$ with weights $\{\omega\}$ from enhanced sampler, normalizing $L$ generates proper transition matrix $M$ , i.e. $M_{i,j}\equiv M(\mathbf{s}^{h}_{i},\mathbf{s}^{h}_{j})=\omega_{j}L(\mathbf{s}^{h}_{i},\mathbf{s}^{h}_{j})/D_{i}$ where $D_{i}=\sum_{j}\omega_{j}L(\mathbf{s}^{h}_{i},\mathbf{s}^{h}_{j})$ (for derivation, see SI). It has been proven that in the limit of an infinite number of samples, $M$ will weakly converge to the generator of the diffusion process Coifman and Lafon (2006) . Different from several well-established dimensionality reduction methods for CVs, including diffusion map Coifman et al. (2008); Rohrdanz et al. (2011), SGOOP Tiwary and Berne (2016), tICA Pérez-Hernández et al. (2013), etc, which utilize low-rank approximation that truncates the number of CVs at a chosen spectral gap, StKE adopts the spirit of tSNE van der Maaten and Hinton (2008) by applying Kullback-Leibler divergence to estimate the similarity of $M$ between the higher-dimension ( $M^{high}$ ) and the lower-dimension ( $M^{low}$ ) for all pairs of datapoints,

[TABLE]

To ensure that StKE learns an explicit function form for the projection function $f$ , we assume that $f$ can be approximated by a parametric model $F(\mathbf{s}^{h};W)\approx f(\mathbf{s}^{h})$ , where $W$ is trainable parameters for model $F$ . A parametrized normalized transition matrix in lower dimension defined as $\tilde{M}^{low}_{i,j}(W)\equiv M^{low}(F(\mathbf{s}^{h}_{i};W),F(\mathbf{s}^{h}_{j};W))$ replaces $M_{i,j}^{low}$ in Eq.( 1), generating the objective function $C(W)$ . Thus, $W$ can be learned by minimizing $C(W)$ and model $F$ can be trained via efficient stochastic gradient descent method.

As the parametric model $F(\mathbf{s}^{h};W)$ has to be differentiable so that the biasing force can be evaluated in an MD simulation with enhanced sampling methods, a neural network is a practical choice for this model, e.g. multilayer perceptron (MLP). MLP consists of multiple fully-connected layers and non-linear activations to simulate the complex function form for the projection $f$ . Since the neural network is differentible with respect to both input and output space, this enables an estimate of $\partial f(\mathbf{s}^{h})/\partial\mathbf{s}^{h}$ by $\partial F(\mathbf{s}^{h};W)/\partial\mathbf{s}^{h}$ , which further allows samplers to estimate biasing forces on the fly.

AES uses WTM as FES sampler, WTM fills the FES with a time-dependent biasing potential by depositing Gaussians on the fly along the simulation trajectory Laio and Parrinello (2002); Barducci et al. (2008), where gaussian heights in WTM decrease as the FES fills up. In the long time limit, it has been proven that the biasing potential will eventually converge to the scaled inverse FES while the CV samples display a Boltzmann distribution at higher temperature $T+\Delta T$ Barducci et al. (2008); Dama et al. (2014), where $T$ is the system temperature and $\Delta T$ is a parameter in WTM. Since decreasing Gaussian heights can generate a more equilibrium-like simulation trajectory, it is easier to unbias samples to the correct ensemble distribution in WTM Bonomi et al. (2009); Tiwary and Parrinello (2015).

The protocol of AES is summarized as follows:

(i) AES starts from a short WTM simulation with arbitrary CVs. The initial set of $\{\mathbf{s}^{h}\}$ are collected and unbiased following the method in Tiwary and Parrinello (2015) to generate sample weights $\{\omega\}$ .

(ii) These samples are then resampled by enforcing a minimal pairwise distance $r_{c}$ to create a sparse description in low free energy regions. Probabilities of the resampled points $\{\tilde{\mathbf{s}}^{h}\}$ , i.e. $\{p(\tilde{\mathbf{s}}^{h})\}$ are calculated as a high temperature ( $T_{h}\geq T$ ) Boltzmann distribution in order to emphasize low probability regions. $\{\tilde{\mathbf{s}}^{h}\}$ and $\{p(\tilde{\mathbf{s}}^{h})\}$ are then used to train StKE CVs $\mathbf{q}^{l}$ .

(iii) $\{\mathbf{s}^{h}\}$ are then mapped onto the updated StKE CVs to generate $\{\mathbf{s}^{l}\}$ with which an initial biasing potential is generated, i.e. $V_{init}(\mathbf{q}^{l}(\mathbf{x}))=\frac{kT\Delta T}{T+\Delta T}\log(\sum_{i}w_{i}e^{-\|\mathbf{q}^{l}(\mathbf{x})-\mathbf{s}^{l}_{i}\|^{2}/\sigma_{init}}+P_{0})-E_{0}$ where $k$ is the Boltzmann constant. $P_{0}$ and $E_{0}$ are constant such that $\min V_{init}=0$ and $\max V_{init}$ equals the maximum biasing potential from the last WTM simulation. This initial biasing potential is then used in next WTM with $\mathbf{q}^{l}$ as CVs.

Steps (i) to (iii) forms one AES iteration. $\{\tilde{\mathbf{s}}^{h}\}$ and $\{p(\tilde{\mathbf{s}}^{h})\}$ are accumulated through all previous iterations to generate next CVs and the whole history of $\{\mathbf{s}^{h}\}$ and $\{w\}$ is kept for updating $V_{init}$ . By explicitly forming the positive feedback loop between StKE and WTM, AES incrementally improves both sample completeness and CV quality through iterations.

Two systems were used to demonstrate the effectiveness of AES: alanine dipeptide and met-enkephalin. The simulations were performed by GROMACS 5 Abraham et al. (2015) with OPLS-AA Kaminski et al. (2001) force field and PLUMED 2 Tribello et al. (2014) was used for WTM simulation. In alanine dipeptide example, samples from a 100ns WTM simulation with $\Phi$ and $\Psi$ were used to construct a benchmark FES (Fig. 1(a)) with four minima (C7eq, C5, C7ax and $\alpha^{\prime}$ ). Considering that two additional dihedral angles $\omega_{1}$ and $\omega_{2}$ are also important to capture configuration-change kinetics Bolhuis et al. (2000), these two dihedral angles, together with $\Phi$ and $\Psi$ , are used as $\mathbf{q}^{h}$ inputs to StKE (inset of Fig. 1(b)) for generating two CVs (i.e., StKE a and b). AES started from a 500ps WTM with Rg and NH, and followed by AES iterations each with 1ns WTM. After 7 AES iterations, configurations were accumulated and unbiased, then mapped back to $\Phi$ and $\Psi$ to generate the FES (Fig. 1(d)). This FES is highly consistent with the benchmark. All four minima are quantitatively sampled in AES with correct free energy values. Such consistency demonstrates that AES can generate Boltzmann distributed samples after unbiasing. The configurations from AES were also mapped onto the StKE CVs from the last iteration, generating a FES in Fig. 1(c).

As mentioned earlier, Rg and NH lead to degeneracy between C7eq and C7ax. As shown in Fig. 1(c), StKE CVs are able to completely separate samples from different clusters. StKE CVs from AES iterations were then used in WTM simulations to calculate $C(t)$ . Faster decay of $C(t)$ with respect to the number of AES iterations indicates quality improvement of StKE CVs due to the increasing completeness of data samples. At AES iteration 7, $C(t)$ decays as fast as the benchmark, indicating that StKE CVs are as good as $\Phi$ and $\Psi$ for preserving kinetic information in alanine dipeptide. We estimated FES convergence by calculating $L_{1}$ -distance per area between a test FES and the benchmark FES in $\Phi$ and $\Psi$ space. The test FES for AES is estimated by accumulating StKE samples and mapping them back to $\Phi$ and $\Psi$ space. As shown in Fig. 1(f), AES achieves similar FES convergence as benchmark, indicating the ability of AES to boost sampling efficiency by iteratively improving CVs. On the other side, WTM with Rg and NH fails to achieve FES convergence comparable to benchmark.

Mapping both folded and unfolded conformations of peptide is another challenging problem Chen et al. (2015). AES was tested on penta-peptide met-enkephalin (Fig. 2(c)) in gas phase. StKE was used to embed 10 Ramachandran dihedral angles to a 2D representation. AES was initiated by a 20ns WTM with Rg, NH and backbone-heavy-atom root-mean-square deviation (RMSD) as CVs, followed by 8 AES iterations each with 100ns WTM using 2D StKE CVs and Rg. The converged 3D FES and selected stable and metastable configurations are presented in SI. In Fig. 2(a), we highlight the FES minimum where degeneracies occurred with the StKE CVs in iteration 1 due to incomplete sampling. When applying WTM with StKE CVs (from 1st iteration) and Rg, metastable configurations from degenerate states are discovered in iteration 2, as shown as the minimum 2 in Fig. 2(c). After updating StKE CVs with these metastable configurations from iteration 2, the degenerate stable states were separated as illustrated in Fig. 2(b), indicating that AES is capable of unfolding discovered hidden barriers from PES into FES that is defined by updated StKE CVs.

To evaluate the efficiency of AES, a 1 $\mu$ s metadynamics simulation with Rg, NH and RMSD (“regular CVs”) was performed as comparison. As demonstrated in Fig. 3(a), a faster exploration of the conformational space (represented as an increase in the resampled points) is observed for AES. Beyond 200ns, AES is able to keep a high efficiency for discovering new configurations while the efficiency from WTM decreases. The linearly increasing of number of resampled points shows that unfolding discovered hidden barriers into FES by StKE encourages WTM to construct biasing potential more efficiently, while with static CVs WTM spends majority of the simulation time revisiting the stable configurations. The ability of AES to guide the enhanced sampling simulations also significantly decreases the average time needed for conformational changes since majority of “hidden barriers” are removed in AES, as shown in Fig. 3(b). In the long time limit, FES filling in WTM with regular CVs is unable to accelerate structural changes due to orthogonal space degeneracies, while StKE CVs remain faster conformational changes.

Active enhanced sampling is a framework joining CV production and sampling to unfold discovered hidden barriers into FES. In analogy to active learning, on-the-fly training StKE CVs, together with the biasing potential in WTM, guides MD simulations to sample the CLIRs. Iteratively training StKE CVs promotes the removal of orthogonal space degeneracies, boosting the sampling efficiency comparing to WTM with static and/or human intuited CVs. In alanine dipeptide example, StKE retains both intra and inter cluster structures, more importantly, the kinetic information is also preserved in StKE. In addition, minima on the FES are consistent with benchmark results, while human intuited CVs (i.e., Rg and NH) are unable to identify all minima. In met-enkephalin system, AES demonstrates its ability to remove degeneracy on the fly, leading to fast exploration of stable and metastable configurations and enhanced transitions.

Besides dihedral angles, other order parameters or those from dimension reduction algorithms, can be adopted as $\mathbf{q}^{h}$ for StKE learning. Other than WTM, enhanced sampling methods, like temperature accelerated molecular dynamics/driven adiabatic free energy dynamics Maragliano and Vanden-Eijnden (2006); Abrams and Tuckerman (2008), adaptive biasing force Darve et al. (2008), and unified free energy dynamics Chen et al. (2012) etc., can also be married with AES. Although the present examples are calculated in gas phase, applying AES to condensed phase simulations is straightforward.

We thank Mark E. Tuckerman, Yu Zhao and Tyler Y. Takeshita for reading the manuscript and giving suggestions. We also thank Phillip Geissler for useful discussion.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Rizzuti and Daggett (2013) B. Rizzuti and V. Daggett, Arch. Biochem. and Biophys. 531 , 128 (2013) . · doi ↗
2De Vivo et al. (2016) M. De Vivo, M. Masetti, G. Bottegoni, and A. Cavalli, J. Med. Chem. 59 , 4035 (2016) . · doi ↗
3Sosso et al. (2016) G. C. Sosso, J. Chen, S. J. Cox, M. Fitzner, P. Pedevilla, A. Zen, and A. Michaelides, Chem. Rev. 116 , 7078 (2016).
4Massobrio et al. (2015) C. Massobrio, J. Du, M. Bernasconi, and P. Salmon, Molecular Dynamics Simulations of Disordered Materials: From Network Glasses to Phase-Change Memory Alloys , Springer Series in Materials Science (Springer International Publishing, 2015).
5Berthier and Biroli (2011) L. Berthier and G. Biroli, Rev. Mod. Phys. 83 , 587 (2011) . · doi ↗
6Sugita and Okamoto (1999) Y. Sugita and Y. Okamoto, Chem. Phys. Lett. 314 , 141 (1999) . · doi ↗
7Laio and Parrinello (2002) A. Laio and M. Parrinello, Proc. Natl. Acad. Sci. U.S.A. 99 , 12562 (2002).
8Barducci et al. (2008) A. Barducci, G. Bussi, and M. Parrinello, Phys. Rev. Lett. 100 , 020603 (2008) . · doi ↗