Improving Conformational Ensembles of Folded Proteins in Go̅Martini

Maksim Kalutskii; Carter J. Wilson; Helmut Grubmüller; Maxim Igaev

PMC · DOI:10.1021/acs.jctc.5c01816·February 25, 2026

Improving Conformational Ensembles of Folded Proteins in Go̅Martini

Maksim Kalutskii, Carter J. Wilson, Helmut Grubmüller, Maxim Igaev

PDF

Open Access

TL;DR

Researchers improved the accuracy of protein simulations using a new optimization method for coarse-grained models.

Contribution

A fully automated perturbation-based optimization method, PoGo̅, is introduced to refine nonuniform Go̅ networks for better conformational sampling.

Findings

01

Martini 3 with ENMs or Go̅ models fails to sample conformational space as well as atomistic simulations.

02

PoGo̅ rapidly converges to produce coarse-grained ensembles matching atomistic simulations closely.

03

Optimization with PoGo̅ also improves root-mean-square fluctuation profiles.

Abstract

The Martini coarse-grained (CG) force field enables efficient simulations of biomolecular systems but cannot reliably maintain folded protein structures. To stabilize proteins during simulation, Martini is typically combined with structure-based force fields such as elastic network models (ENMs) or Go̅ models. While these approaches preserve global folds and capture protein flexibility, their ability to reproduce conformational dynamics remains unclear. Here, we evaluate Martini 3 combined with ENMs or Go̅ models on three folded proteins and show that both approaches struggle to sample the conformational space observed in atomistic simulations, even when uniform interaction strengths or equilibrium bond distances are adjusted. This limitation arises from the assumption of a uniform interaction network, in which all Go̅-bonds are assigned the same ϵ value, and therefore have the same…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

Martini 3

Figures4

Click any figure to enlarge with its caption.

Performance of unoptimized Go̅Martini. (a) The first three PC modes of T4L and their contributions to the total variance. (b) The first 20 eigenvalues sorted in descending order. (c) All-atom MD and CG trajectories projected onto the essential subspace of the MD ensembles (PC1 and PC2). Each column corresponds to a different protein, and each row corresponds to a different CG setup. (d) Root mean squared inner product (RMSIP), covariance overlap (CO), and sliced Wasserstein distance (SWD) calculated between the all-atom MD and CG ensembles in the essential subspace of the MD ensembles.

Perturbation-based optimization algorithm. (a) Sampling overlap between a CG ensemble (blue) and the reference all-atom MD ensemble (black) in the essential subspace. Given an initial CG ensemble and probability distribution (contour lines) (Step 1), the goal is to find a perturbation λ that increases the probability of overlapping configurations, ↑ρ(q), and decreases the probability of nonoverlapping configurations, ↓ρ(q) (Step 2). With the optimal λ, a new trajectory is generated based on the modified potential (Step 3), followed by finding the next optimal perturbation (Step 4). (b) Convergence behavior of the objective function and SWD for T4L. The iteration with the best agreement is indicated with a star. (c) Distribution of the optimized Go̅ network strengths for multiple replicas and for different initial (uniform) strengths. The initial and average values are indicated with vertical lines. (d) A representative optimized network mapped onto the 3D structure of T4L in the CG representation. Edge colors and sizes reflect the strength of the interaction. (e) Comparison of the optimized Go̅ networks for different initial (uniform) strengths. Values along the diagonal correspond to the networks whose optimizations started with the same initial strength. Standard errors are provided in brackets.

Performance of optimized Go̅Martini. (a) Representative ensembles for each protein generated with the unoptimized (left, purple) and optimized (right, blue) Go̅ network. Reference atomistic ensembles are depicted in white. (b) CG trajectories projected onto the all-atom essential subspace. Same color coding as in panel (a). (c) SWD values between the atomistic reference distribution and the unoptimized (purple) or optimized (blue) CG distributions projected onto the all-atom essential subspace. (d) Root-mean-squared fluctuation profiles for the unoptimized and optimized Go̅ networks compared to the reference atomistic profiles. Bar plots quantify the per-residue average unsigned error to the atomistic profiles.

(a) Root-mean-squared inner product (RMSIP), covariance overlap (CO), and sliced Wasserstein distance (SWD) calculated between the all-atom MD and CG ensembles in the essential subspace of the all-atom MD ensembles. Per-residue average unsigned deviation to the atomistic backbone fluctuation profile (RMSF AUE). (b) CG trajectories projected onto the all-atom essential subspace. Same color coding as in panel (a).

Funding2

—Natural Sciences and Engineering Research Council of Canada10.13039/501100000038
—Deutsche Forschungsgemeinschaft10.13039/501100001659

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProtein Structure and Dynamics · Biochemical and Structural Characterization · 14-3-3 protein interactions

Full text

Introduction

Proteins perform a vast array of fundamental functions in living systems, including catalyzing reactions, ?,? facilitating protein folding, ?,? signal transduction, ?,? and transporting molecules. ?,? These processes often require proteins to undergo conformational changes, allowing them to alternate between biologically functional states. ?,? As more protein structures are deposited into the Protein Data Bank and more molecular simulations are performed, a key observation has emerged: the majority of protein dynamics can be described by a surprisingly small number of collective variables which can be effectively identified using dimensionality reduction techniques such as principal component analysis (PCA), ?,? full correlation analysis (FCA),? time-independent component analysis (TICA),? functional mode analysis (FMA), ?,? or coevolution-driven analysis.? A recent review? provides a more comprehensive summary of conventional and state-of-the-art machine learning methods for inferring collective variables from molecular simulations.

When applied to all-atom molecular dynamics (MD) trajectories, PCA usually yields a small fraction of principal components (PCs) that capture most of the ensemble variance. ?,? Projecting the high-dimensional dynamics along these “essential” PCs and quantifying the low-dimensional free-energy landscape is key to our understanding of protein conformational dynamics and folding. ?,?−? ? ? ? ? ? Enhanced sampling techniques can further exploit this low-dimensional picture of protein dynamics to accelerate exploration of the free-energy landscape. ?−? ? ? ? This low-dimensional framework also suggests that the details of fast degrees of freedom may be less essential for characterizing critical protein functions over extended time scales. This insight has inspired the development of coarse-grained (CG) models of proteins and complexes thereof, ?−? ? ? ? which abandons the explicit description of fast degrees of freedom in favor of a simplified yet sufficiently accurate representation of the system of interest. In some cases, the essential dynamics of a protein have even been used to directly guide CG topology design.?

Coarse-graining presents a compelling approximation to atomistic MD, because it significantly reduces computational costs, thus enabling improved sampling and access to spatiotemporal scales that are often computationally infeasible with all-atom approaches. The Martini force field ?,? is one of the most widely used physics-based CG force fields for simulating biomolecular systems. However, due to the averaging of directional interactions, such as hydrogen bonds, Martini is unable to maintain folded protein structure; in light of this limitation, structure-based CG models, such as elastic network models (ENMs)? or Go̅ models, ?,? are often used in combination with Martini. ?−? ? While ENMs can accurately capture local protein flexibility, they are unable to describe large conformational changes due to the harmonic nature of their bonds.? To overcome this limitation and thereby to also increase protein flexibility, including partial unfolding, Go̅ models with breakable bonds encoded as Lennard-Jones potentials are usually employed instead. ?−? ? ? ?

Both the Go̅ and ENM networks connect backbone beads in order to recover the directional hydrogen bonds that stabilize protein structure. For each pairwise interaction, an equilibrium distance and interaction strength are defined. For Go̅ bonds, the potential is

[eqn]

where ϵ sets the depth (and strength) of the potential and σ determines the position of the energy minimum. One of the main shortcomings of using structure-based force fields for generating protein conformational ensembles is that these are typically unable to correctly describe the distribution of variance across PCs.? Whereas atomistic MD simulations combined with PCA usually identify a few soft, highly collective motions that capture most of the variance, ?,? both ENMs and Go̅ models tend to spread the same variance across a much larger number of PCs. ?,? When such structure-based models are combined with Martini, the resulting coarse-grained representation further inherits the approximations of the underlying force field. As a result, the free-energy landscape in the essential subspace can deviate from that produced by atomistic force fields. This reflects both the approximate, state-dependent nature of effective coarse-grained interactions and the fact that Martini is primarily parametrized to reproduce selected thermodynamic targets, such as partitioning free energies, rather than conformational free-energy surfaces associated with specific biomolecular processes. In addition, known challenges in entropy–enthalpy decomposition within the Martini force field ?,? further motivate a pragmatic, system-specific refinement of structure-based biasing terms when accurate conformational ensembles are required. Although combining Martini with the Go̅ model seems to be a more promising approach for studying large conformational changes in CG simulations, ?,? it still samples only a small fraction of atomistic conformations, even when optimized to reproduce atomistic fluctuations.? Recently, it has been proposed to decouple the development of Martini from that of the Go̅ model and the ENM.? In this view, the Martini force field itself should follow a standard building-block optimization strategy, while structure-based models such as Go̅ or ENM potentials are introduced as auxiliary biasing terms, informed by experimental data or all-atom MD simulations, to stabilize folded protein structures and guide conformational sampling.? Importantly, these structure-based terms do not constitute a reparameterization of the Martini force field, but rather compensate for stabilizing interactions, e.g., directional hydrogen bonding, that are absent in the coarse-grained representation. In practice, such corrections are routinely introduced and tuned based on all atom MD simulation in a system-specific manner ?,?−? ? to enable simulations of folded proteins, protein complexes, and large conformational transitions. In this sense, refining a Go̅ or ENM network does not contradict Martini’s original philosophy, but rather represents an established refining procedure when protein conformational ensembles are of interest. Along these lines, we suggest that fine-tuning structure-based potentials should seek to improve the agreement between atomistic and CG ensembles, with the goal of reproducing the free-energy landscape along the essential PCs, rather than only matching fluctuation profiles, that neglect higher correlations, or identifying heuristic hyperparameters that are fit using training datasets.?

Optimizing the free-energy landscape in the essential subspace is a particularly challenging task. Standard methods for optimizing coarse-grained force fields, such as iterative Boltzmann inversion? or relative entropy minimization,? require rerunning sufficiently long CG simulations at every optimization step to resample the free-energy landscape, which makes these approaches computationally intractable for high-dimensional parameter spaces.

In this paper, we develop and assess PoGo̅, a perturbation-based optimization method for Go̅ networks, here specifically applied to Go̅Martini. By analytically determining how network improvements will shift a conformational ensemble, we can propose a single, maximally informative update, avoiding the need for MD simulations at every trial parameter change. Instead, simulations are performed only after the optimal update is selected, enabling fast optimization. When combined with the particle swarm optimization (PSO) algorithm, ?,? which has previously been used in Martini optimization, ?,? our approach yields force field perturbations that drastically improve the agreement between a CG and an atomistic reference ensemble.

We tested our method on three different protein systems, and obtained converged Go̅ networks within tens of optimization steps in each case. The resulting CG free-energy landscapes in the essential subspace agree well with those of the corresponding atomistic reference simulations. We further extend our framework to optimize the spring constants in ENMs with similar success. Moreover, we find that, while not explicitly optimizing atomistic fluctuations, improving agreement along the essential PCs also improves the agreement between the atomistic and CG fluctuation profiles. In short, we provide a fully automated approach for optimizing the essential dynamics and fluctuations of a Go̅Martini-based protein model.

Methods

Perturbation Theory

Following Koyama et al.,? we formulate the expression for the change in the conformational distribution of a simulated protein induced by an arbitrary set of linearly independent perturbation functions.

Consider a molecular system described by the additive potential function $[eqn]$ , where q are the Cartesian coordinates. Measuring all energies in units of k B T, where k B is the Boltzmann constant and T is the temperature, its canonical configurational distribution is written as

[eqn]

where Z = ∫e^–V ^ d^3N ^ q is the configurational integral.

From the set of N potentials constituting V(q), we choose a subset of M ≤ N potentials $[eqn]$ and define perturbation coefficients $[eqn]$ to construct a new, perturbed potential

[eqn]

such that the canonical configurational distribution corresponding to this perturbed potential is given by

[eqn]

where we define $[eqn]$ , and ⟨···⟩ denotes the statistical average, with respect to the unperturbed configurational distribution ρ(q).

With this perturbation, the interaction corresponding to one of the selected potentials L _ k (q), where k ∈ {1, ..., M}, is strengthened when λ k _ > 0 and it is weakened when λ_ k _ < 0. When λ = 0, the original and perturbed potentials are identical.

Since we are only interested in small perturbations, i.e., ∥λ∥ ≡λ ^T^ λ ≪ 1, we use a second-order Taylor expansion of ψ(λ) at λ = 0 yielding

[eqn]

By using this approximate expression, eq reads

[eqn]

We note that the covariance matrix C is positive semidefinite and, therefore, can be represented as C = UΩU ^T^, where U = [u 1, ..., u _ M _] are the orthonormal eigenvectors of C and $[eqn]$ is the diagonal matrix that contains the corresponding non-negative eigenvalues. By performing the change of basis λ = Uξ and defining $[eqn]$ , the following approximate expression for the change in the conformational distribution induced by the perturbation in eq is obtained:

[eqn]

Equation provides a computationally efficient way to quantify how small perturbations to linear interaction terms affect an equilibrium conformational distribution.

We note that eq relies on a small-perturbation assumption and is therefore not expected to be quantitatively exact for finite parameter updates. For this reason, the optimization is implemented as an iterative procedure in which the perturbative estimate is used only to propose a locally optimal update, after which a new CG simulation is performed and the approximation is re-evaluated around the updated ensemble.

As shown in the Results and Discussion section, we observe smooth and stable convergence of both the objective function and the sampled ensembles across independent replicates and for a wide range of initial Go̅ network strengths. This behavior indicates that individual updates remain sufficiently small for the linear-response approximation underlying eq to be effective, while repetition compensates for any inaccuracies of the approximation at finite step sizes.

Ensemble Similarity Metrics

In what follows, we introduce three metrics to quantify the agreement between the essential subspaces of the tested atomistic and CG models: the root-mean-square inner product, the covariance overlap, and the sliced Wasserstein distance.

The root-mean-square inner product (RMSIP),

[eqn]

measures the geometric similarity between two subspaces spanned by the first n eigenvectors, {v 1, ...,v _ n _} and {w 1...w _ n _}, corresponding to the two ensembles:

The covariance overlap (CO),

[eqn]

is similar to RMSIP but emphasizes subspace overlaps along directions of large variance, while downweighting overlaps that carry little variance. Here, $[eqn]$ , $[eqn]$ , and {μ_1_, ..., μ_ n } and {ν_1, ..., ν_ n _} are the respective eigenvalues. Note that both RMSIP and CO range from 0 (no overlap) to 1 (perfect agreement).

The sliced Wasserstein distance (SWD) of order γ ∈ [1, ∞),

[eqn]

is a computationally efficient approximation of the full Wasserstein distance

[eqn]

of the same order between two probability distributions, f on $[eqn]$ and g on $[eqn]$ . Here, d(·,·) is the distance metric on X × Y (i.e., the transportation cost), and Γ denotes the set of all possible joint distributions z (transportation plans) whose marginals are f and g.? Because computing the WD directly becomes prohibitively expensive for high-dimensional spaces (d > 1), the SWD offers a tractable alternative. It approximates the WD by projecting both distributions onto one-dimensional subspaces (or “slices”) defined by directions θ sampled uniformly from the unit sphere $[eqn]$ . WD_γ_(f θ, g θ) is the one-dimensional Wasserstein distance between the projected distributions f θ and g θ, and dS θ is the uniform measure on $[eqn]$ .

Unlike other structural similarity metrics such as RMSIP or CO, the SWD provides a true distance measure that ranges from 0 (perfect agreement) to +∞ (no overlap). Unless otherwise stated, we use SWD of order γ = 3 in all analyses.

All-Atom and Coarse-Grained Simulations

All molecular dynamics simulations were performed using GROMACS 2023.? For the atomistic simulations, the Amber99SB*-ILDN? force field and TIP3P water model? was used; for the CG simulations, Martini 3 was used.? Martini topology generation was performed using martinize2.? Initial elastic (κ = 500 kJ/mol/nm^2^) and Go̅ networks (ε = 9.4 kJ/mol) were constructed using reference crystal structures, i.e., T4 lysozyme: PDB ID 182L, E. coli ribose binding protein: PDB ID 2DRI, and E. coli maltose binding protein: PDB ID 1MPB, with default cutoffs, i.e., ENM: [0, 0.9 nm] and Go̅: [0.3, 1.1 nm]. These PDB structures were used for both coarse-grained and atomistic simulations. For all systems, an initial minimization was performed using the steepest descent algorithm. For all production simulations the leapfrog integrator was used with a time step of 2 fs for the atomistic and 20 fs for the CG systems. A temperature of 300 K was maintained using the velocity-rescaling thermostat? with a 1 ps coupling time and the pressure was maintained at 1 bar using the C-rescale barostat? with a coupling time of 5 ps. For the atomistic simulations, long-range electrostatic interactions were calculated using the Particle-mesh Ewald method? with a real-space cutoff of 1.0 nm and grid spacing of 0.12 nm, while the Lennard-Jones interactions were truncated at 1.0 nm and a dispersion correction was applied. Bonds to hydrogen atoms were constrained using the Parallel LINear Constraint Solver.? For the CG simulations, electrostatics were treated using a reaction-field with a 1.1 nm cutoff and relative dielectric of ϵ = 15. Lennard-Jones interactions were modified with a potential shift and a cutoff of 1.1 nm.

Results and Discussion

Limitations of Uniform Go̅ Networks in Reproducing Essential

Dynamics

To assess the accuracy of the essential dynamics relative to all-atom simulations, we chose three test systems: T4 lysozyme (T4L), E. coli ribose binding protein (RBP), and E. coli maltose binding protein (MBP). We produced 5 × 600 ns of all-atom MD trajectories, as well as Martini sampling using an ENM or a Go̅ network. We discarded the first 100 ns from each trajectory as an equilibration phase (see the Methods section). The atomistic trajectories were forward-mapped into a Martini representation using martinize2.?

We then performed Cartesian-space PCA on the backbone beads of the forward-mapped atomistic trajectories for each of the three test systems. Projecting the same atomistic trajectories along the first three modes revealed diverse characteristic motions, consistent with previous reports. Specifically, the analysis of the T4L trajectories identified a hinge-bending motion (PC1), a twisting motion (PC2), and a torsional domain motion (PC3), which accounted for 21%, 15%, and 8% of the total ensemble variance, respectively (Figurea and ?b). Similarly, RBP exhibited a domain-closing motion (PC1), a twisting motion (PC2), and a propeller motion (PC3), capturing 31%, 30%, and 11% of the total ensemble variance, respectively. For MBP, a domain-closing motion (PC1), a twisting motion (PC2), and a helix shift (PC3) explained 20%, 15%, and 9% of the variance, respectively (Figure S1).

Performance of unoptimized Go̅Martini. (a) The first three PC modes of T4L and their contributions to the total variance. (b) The first 20 eigenvalues sorted in descending order. (c) All-atom MD and CG trajectories projected onto the essential subspace of the MD ensembles (PC1 and PC2). Each column corresponds to a different protein, and each row corresponds to a different CG setup. (d) Root mean squared inner product (RMSIP), covariance overlap (CO), and sliced Wasserstein distance (SWD) calculated between the all-atom MD and CG ensembles in the essential subspace of the MD ensembles.

Projections of both the all-atom and CG trajectories onto the essential subspace revealed that the uniform, 9.4 kJ/mol Go̅ model ensembles were markedly more expanded than the atomistic ones and, in the case of T4L and MBP, sampled multiple minima (Figuresc and S2). The ENM ensembles of T4L and RBP exhibited distorted but partially overlapped with the atomistic distributions, while MBP showed a much tighter distribution (Figuresc and S2). Overall, neither the ENM nor the uniform 9.4 kJ/mol Go̅ model fully reproduced the sampling of the all-atom MD simulations within the essential subspace.

To compare the essential dynamics captured by the all-atom and CG models more systematically, we employed three complementary metrics: the root-mean-square inner product (RMSIP) (eq), the covariance overlap (CO) (eq), and the sliced Wasserstein distance (SWD) (eq, see the Methods section). For clarity, we only report the means and standard errors for each metric calculated over all simulation replicas and all protein systems.

The average RMSIP was 0.83 ± 0.05 for the ENM and 0.70 ± 0.11 for the Go̅ model, while we calculated the average CO of 0.71 ± 0.03 for the ENM and 0.53 ± 0.16 for the Go̅ model (Figured, left and middle). This result indicates that both models are able to capture the overall essential modes of the test proteins and overall shape of the essential subspace reasonably well.

Although both models reproduce the overall shape and dominant directions of the essential subspace, our primary interest lies in how the explored conformational space is populated. RMSIP and CO primarily measure the geometric similarity between distinct subspaces, focusing on whether the dominant directions of motion align, but they are largely insensitive to how the system samples along those directions. In particular, they do not account for differences in the mean position of the sampled distributions and consider only the relative magnitudes of the eigenvalues. As a result, they may yield high similarity scores even when the underlying populations of conformational states show little or no overlap. In this respect, the SWD provides a better metric to compare the sampled distributions themselves. The calculated SWD values confirmed these observations, yielding 0.83 ± 0.18,nm for the ENM and a much larger distance of 2.07 ± 0.16 nm for the Go̅ model, mainly due to the poor overlap for T4L (Figured, right side).

For all three test proteins, but particularly for T4L, we observed that the ENM outperformed the standard Go̅ model with respect to reproducing the atomistic sampling within the essential subspace. We attribute the better performance of the ENM to the harmonic nature of its bonds. In particular, harmonic and unbreakable bonds of the ENM likely restrict significant deviations from the initial conformation, while the breakable Go̅ bonds initialized with a relatively weak ε likely cause distortions of the reference structure.

To assess whether strengthening the Go̅ network could improve its performance, we increased the interaction strength by 50% (ε = 14.1 kJ/mol). This adjustment led to a contraction of the sampled ensemble and an improved SWD which shifted from 2.07 ± 0.16 nm to 1.38 ± 0.13 nm. We observed a small improvement in RMSIP and no significant change in CO.

We next investigated the effect of modifying the equilibrium Lennard-Jones distances of the Go̅ potentials, i.e., r min,i _ = 2^1/6^σ i _. Specifically, we calculated the average distance for each pair of backbone beads i and j (r _ ij _) from the atomistic trajectory and set it as an equilibrium distance for the corresponding Go̅ potential, denoted here as Go̅′. This adjustment further improved the CO from 0.56 ± 0.18 to 0.73 ± 0.06 and produced smaller improvements in RMSIP and SWD.

In short, (i) Martini 3 simulations with default ENM or Go̅ networks do not accurately reproduce the free-energy landscape in the essential space observed by all-atom MD, and (ii) uniformly increasing the strength of the Go̅ potentials or adjusting their equilibrium distances provides only a modest yet suboptimal improvement.

Perturbation-Based Optimization of Nonuniform Go̅ Networks

These results suggest (Figurea) that optimizing a nonuniform network of Go̅ potentials is required to match a reference ensemble in the essential subspace. Here, we outline a method for optimizing a nonuniform network using the perturbation theory framework described in the Methods section. The atomistic ensemble is first mapped into the Martini CG representation, after which a principal component analysis (PCA) is performed and the trajectory is projected onto the resulting essential subspace to obtain the target distribution, ρ_AA_(q). A CG ensemble is then generated, initialized with a uniform Go̅ network, and then projected onto the same subspace to obtain ρ_CG_(q), which serves as the basis for optimization (Step 1 in Figurea).

Perturbation-based optimization algorithm. (a) Sampling overlap between a CG ensemble (blue) and the reference all-atom MD ensemble (black) in the essential subspace. Given an initial CG ensemble and probability distribution (contour lines) (Step 1), the goal is to find a perturbation λ that increases the probability of overlapping configurations, ↑ρ(q), and decreases the probability of nonoverlapping configurations, ↓ρ(q) (Step 2). With the optimal λ, a new trajectory is generated based on the modified potential (Step 3), followed by finding the next optimal perturbation (Step 4). (b) Convergence behavior of the objective function and SWD for T4L. The iteration with the best agreement is indicated with a star. (c) Distribution of the optimized Go̅ network strengths for multiple replicas and for different initial (uniform) strengths. The initial and average values are indicated with vertical lines. (d) A representative optimized network mapped onto the 3D structure of T4L in the CG representation. Edge colors and sizes reflect the strength of the interaction. (e) Comparison of the optimized Go̅ networks for different initial (uniform) strengths. Values along the diagonal correspond to the networks whose optimizations started with the same initial strength. Standard errors are provided in brackets.

We then construct the following objective function to be minimized:

[eqn]

where $[eqn]$ is the projection of the canonical configurational distribution for the perturbed CG ensemble (see the Methods section, eq).

The advantage of this objective function is that the expression $[eqn]$ can be computed using the approximate analytical expression in eq, which only depends on λ and ρ_CG_(q) and does not require running simulations of the perturbed ensemble. This allows us to use the fast PSO method ?,? to find an optimal perturbation λ that minimizes the objective function (Step 2 in Figurea). We then update the CG force field using eq and simulate a new CG ensemble with the updated Go̅ network (Step 3 in Figurea). Finally, as eq provides only an approximate correction and the Go̅ model is coupled to the Martini 3 force field, we iteratively repeat steps 2 and 3 until convergence (Step 4 in Figurea).

To test the robustness of our method, we repeated the optimization three times for each test protein. In all cases, the optimizations converged within ∼30 steps, with the largest gains in agreement occurring within the first 15 steps (Figuresb and S3). This suggests that a typical optimization of the Go̅ network for a small to medium-sized protein only requires a few tens of microseconds of CG sampling to converge efficiently, and because our algorithm is highly parallelizable, the wall clock time can be reduced even further if more compute nodes are available.

We then tested how the choice of the initial (uniform) network strength affected the optimization results. We performed additional Go̅ network optimizations, starting from different ε values for the corresponding Lennard-Jones interactions: 10, 12.5, 15, and 17.5 kJ/mol. Irrespective of the choice of the initial network strength or the test protein, we observed similarly fast and efficient convergence within ∼30 iterations. The resulting Go̅ networks yielded CG ensembles with SWD values near 0.1 nm relative to the reference all-atom MD ensembles (Figure S3)more than an order of magnitude improvement compared to the values reported for uniform Go̅ networks (Figured).

Optimization yielded unique Go̅ networks for each protein, with some bonds strengthening and others weakening (Figured). To assess these changes more systematically, we analyzed the distributions of the optimized interaction strengths (Figurec). Optimization replicas started from the same initial network strengths converged to identical distributions (within the sampling error), whereas the distribution means differed for different starting strengths (Figurec). These many minima and the flat optimization landscape are expected because the target observable, the essential dynamics, depends on collective motion patterns rather than on specific pairwise interactions. As a result, many different redistributions of interaction strengths can produce the same overall dynamical behavior. The stochastic optimization therefore explores a broad manifold of equivalent minima with comparable mean energies.

To compare the optimized ensembles more systematically, we computed (i) the Pearson correlation of the optimized bond energies and (ii) the L 2-norm of the Laplacian eigenvalue spectrum across the different networks (Figuree). The Pearson correlation and the L 2-norm were 0.88–0.97 and 0.02–0.03, respectively, for the replicas started from the same initial network strength. When comparing replicas started from different initial strengths, the Pearson correlation ranged between 0.77 and 0.88 and the L 2-norm between 0.04 and 0.08 (Figurese and S4). These points support our hypothesis that the algorithm primarily reweights the relative energies of the Go̅ bonds to reproduce the essential dynamics, rather than converging toward a single optimal set of absolute bond strengths. This observation is additionally supported by analyzing the total potential energies of an optimized network. The total energy increases only for low initial network strengths (e.g., ε = 10 kJ/mol), whereas it remained largely unchanged for the optimizations started from high interaction strength (e.g., ε = 17.5 kJ/mol) (see Figuresc and S5).

Although the optimization converges to multiple distinct parameter sets, these solutions are effectively equivalent with respect to the target observables. All reproduce the same essential dynamics and yield indistinguishable collective motions along the principal components. This degeneracy might formally resemble overfitting, but in this context it merely reflects the redundancy of the parameter space. The optimization is thus underdetermined rather than overfitted: different parameter sets encode the same large-scale motions equally well.

In short, the proposed optimization algorithm demonstrates fast convergence within a few tens of iterations, is robust with respect to the target of optimization and choice of the initial Go̅ network strength and the test protein, and produces CG ensembles that are in excellent agreement with the reference all-atom MD ensembles, in terms of their essential dynamics.

Optimized Go̅ Networks Reproduce Essential Atomistic Dynamics

and Improve Local Protein Flexibility

As stated in the first section, our aim is to improve the agreement between the reference atomistic and CG free-energy landscapes in the essential subspace. To this end, we optimized the Go̅ networks for T4L, RBP, and MBP using the procedure described in the previous section (Figure) and compared them to the corresponding unoptimized ensembles using a uniform Go̅ network (Figurea). We set the interaction strength of the unoptimized networks to ε = 12.5 kJ/mol; however, a similar behavior was observed for the other tested interaction strengths (ε = 10.0–17.5 kJ/mol).

Performance of optimized Go̅Martini. (a) Representative ensembles for each protein generated with the unoptimized (left, purple) and optimized (right, blue) Go̅ network. Reference atomistic ensembles are depicted in white. (b) CG trajectories projected onto the all-atom essential subspace. Same color coding as in panel (a). (c) SWD values between the atomistic reference distribution and the unoptimized (purple) or optimized (blue) CG distributions projected onto the all-atom essential subspace. (d) Root-mean-squared fluctuation profiles for the unoptimized and optimized Go̅ networks compared to the reference atomistic profiles. Bar plots quantify the per-residue average unsigned error to the atomistic profiles.

Simulations with the unoptimized, uniform Go̅ network produced ensembles with excess fluctuations and partial unfolding (Figurea). In contrast, a qualitative comparison of the optimized and unoptimized CG ensembles showed that the former were visually more similar to the reference all-atom ensembles – especially in terms of the alignment of small loops and the relative positions of motifs (Figurea). A more detailed analysis of the essential subspaces revealed a significant improvement when comparing them to that of the reference all-atom ensembles. Specifically, we found that SWD decreased significantly from, on average, 1.67 ± 0.70 nm to 0.06 ± 0.02 nm for all tested protein systems (Figuresb,c and S6). RSMIP increased from 0.76 ± 0.16 to 0.83 ± 0.10, while CO improved from 0.58 ± 0.27 to 0.71 ± 0.17 (Figure S7).

Having targeted the first three PCs in the optimization, we also calculated the RMSIP and CO over the first five and ten PCs. We found improvements relative to the unoptimized Go̅ network even when these additional PCs were included (Figure S8).

In addition, we analyzed local root-mean-square fluctuations (RMSF) of the backbone beads of the optimized and unoptimized CG ensembles, as these are often a primary target in optimizing protein dynamics in Martini. Because it was not an explicit optimization target, the RMSF served as an excellent independent cross-validation metric, showing markedly improved agreement between the optimized CG and reference all-atom ensembles. Specifically, the per-residue average unsigned deviation decreased by roughly a half, from 0.04 ± 0.02 nm to 0.02 ± 0.00 nm (Figured), suggesting that our global optimization approach can simultaneously improve the local fluctuation profile.

To test whether improving local RMSF profiles alone enhances agreement in the essential subspace, we analyzed unoptimized ensembles with uniformly increased interaction strengths. Although higher uniform interaction strength reduced bead fluctuations and improved RMSF, this stiffening led to only minor gains in overlap with the atomistic essential subspace (Figure S7). In contrast, our optimization achieved markedly better agreement in both the fluctuation profile and essential subspace with much smaller perturbations to the total interaction energy, indicating that reproducing local RMSF does not necessarily ensure accurate global dynamics.

Optimized ENMs Can Also Reproduce Essential Dynamics

Given that perturbations to the spring constants in an ENM are linear, we applied our optimization framework here as well. For all three test proteins, the optimization resulted in only small changes to the already reasonable RMSIP and CO values, while significantly improving the ensemble overlap and fluctuation profiles (Figure). This demonstrates that our approach can be applied to both the Go̅ and ENM components of the Martini force field.

(a) Root-mean-squared inner product (RMSIP), covariance overlap (CO), and sliced Wasserstein distance (SWD) calculated between the all-atom MD and CG ensembles in the essential subspace of the all-atom MD ensembles. Per-residue average unsigned deviation to the atomistic backbone fluctuation profile (RMSF AUE). (b) CG trajectories projected onto the all-atom essential subspace. Same color coding as in panel (a).

Conclusions

Here, we developed an automated optimization framework, PoGo̅, that combines a low-dimensional representation of protein dynamics, an efficient global optimization algorithm, and analytical perturbation theory to fine-tune the interaction strengths of Go̅-based networks. Conceptually, PoGo̅ occupies an intermediate position between bottom-up and top-down coarse-graining: atomistic simulations provide bottom-up information to refine a structure-based biasing potential, while the optimization target is the low-dimensional free-energy landscape of collective motions. All remaining higher-frequency degrees of freedom are treated implicitly by the Martini force field. Validation on three proteins with varying complexity and essential dynamics demonstrated that the method achieves excellent agreement with atomistic reference ensembles within only a few iterations.

In contrast, the standard Martini force fieldsupported by either the default elastic network or a uniform-strength Go̅ networkwas unable to accurately reproduce atomistic essential dynamics. Increasing the network interaction strengths or deriving equilibrium bead–bead distances from all-atom simulations yielded only marginal improvements, with no universal parameter set performing consistently across systems.

Despite extensive work highlighting the importance of protein essential dynamics, ?,?,?−? ? ? ?,?,? these features are often overlooked when optimizing CG models, where more emphasis is typically placed on reproducing local fluctuations.? While improving essential dynamics can naturally enhance local fluctuations, the reverse is not necessarily true. Our results demonstrate that targeting the lowest-energy collective motions might provide a more efficient route to achieving both accurate global and local dynamics.

Recently, atomistic simulations have been used to optimize both the number of Go̅ bonds and their equilibrium distances by analyzing high-frequency contacts.? One promising avenue for future development would be to combine this approach with PoGo̅.

Our method is not restricted to a particular reaction coordinate. Although, in this study, we focused on free-energy landscapes within the essential subspace, the same framework could be used to optimize probability distributions along any generalized reaction coordinate, such as the radius of gyration or the distance between the centers of mass of two protein domains. The dimensionality of the essential subspace is also tunable; however, using more than five dimensions bears the risk of undersampling in all-atom molecular dynamics (MD) simulations and makes optimization challenging due to rapidly diminishing probabilities in high-dimensional space. Furthermore, the approach is not limited to the Lennard–Jones potentials of Go̅Martini and is compatible with any additive CG force field, rendering it a general and transferable strategy for global ensemble refinement in high-dimensional parameter spaces. Although our method can be adopted for any type of CG models direct comparison to fully bottom-up coarse-grained models was not pursued here, as such approaches rely on fundamentally different parameterization philosophies and functional forms, whereas the present work specifically focuses on refining structure-based biasing terms within the Martini framework while preserving its standard interactions, efficiency, and compatibility with existing Martini-based workflows.

Despite its broad applicability and flexibility, several limitations should be considered. First, the current implementation optimizes only the depths of the Lennard–Jones potentials in the Go̅ network, requiring any additional parameters (e.g., equilibrium bead–bead distances) to be known and fixed prior to optimization. Second, the network topology remains fixed throughout the procedure. Selecting an optimal topology is a complex problem in its own right and lies beyond the present scope. Nevertheless, a too dense network or high initial network strength may hinder studies involving mutations or ligand binding where local flexibility is essential. Alternative network generation methods such as OLIVES? may provide more suitable starting points in such cases. Third, the optimization quality depends on the quality of the target ensemble. Although we employ all-atom MD references here, the target ensemble might also be derived from deep learning approaches (e.g., BioEmu?) or from experimental data.?

In conclusion, we describe a generalizable, automated framework for optimizing CG models to reproduce atomistic protein dynamics. Integrating physics-inspired? or machine-learned priors ?,? for network topology design could further enhance performance. At the same time, incorporating experimentally derived ensemble information from techniques such as electron microscopy, ?,? SAXS, or NMR, offers a promising avenue for integrative structural biology. Altogether, this framework enables more reliable and high-throughput studies of protein dynamics, functional mechanisms, mutational effects, and protein assembly within CG representations.

Supplementary Material

Bibliography75

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Eisenmesser E. Z.Millet O.Labeikovsky W.Korzhnev D. M.Wolf-Watz M.Bosco D. A.Skalicky J. J.Kay L. E.Kern D.Intrinsic dynamics of an enzyme underlies catalysis Nature 200543811712110.1038/nature 0410516267559 · doi ↗ · pubmed ↗
2Hanson J. A.Duderstadt K.Watkins L. P.Bhattacharyya S.Brokaw J.Chu J.-W.Yang H.Illuminating the mechanistic roles of enzyme conformational dynamics Proc. Natl. Acad. Sci. U.S.A.2007104180551806010.1073/pnas.070860010417989222 PMC 2084295 · doi ↗ · pubmed ↗
3Krukenberg K. A.Street T. O.Lavery L. A.Agard D. A.Conformational dynamics of the molecular chaperone Hsp 90Q. Rev. Biophys.20114422925510.1017/S 003358351000031421414251 PMC 5070531 · doi ↗ · pubmed ↗
4Clare D. K.Vasishtan D.Stagg S.Quispe J.Farr G. W.Topf M.Horwich A. L.Saibil H. R.ATP-Triggered Conformational Changes Delineate Substrate-Binding and -Folding Mechanics of the Gro EL Chaperonin Cell 201214911312310.1016/j.cell.2012.02.04722445172 PMC 3326522 · doi ↗ · pubmed ↗
5Jin P.Jan L. Y.Jan Y.-N.Mechanosensitive Ion Channels: Structural Features Relevant to Mechanotransduction Mechanisms Annu. Rev. Neurosci.20204320722910.1146/annurev-neuro-070918-05050932084327 · doi ↗ · pubmed ↗
6Latorraca N. R.Venkatakrishnan A. J.Dror R. O.GPCR Dynamics: Structures in Motion Chem. Rev.201711713915510.1021/acs.chemrev.6b 0017727622975 · doi ↗ · pubmed ↗
7Björkman A. J.Binnie R. A.Zhang H.Cole L. B.Hermodson M. A.Mowbray S. L.Probing protein-protein interactions. The ribose-binding protein in bacterial transport and chemotaxis J. Biol. Chem.1994269302063021110.1016/S 0021-9258(18)43798-27982928 · doi ↗ · pubmed ↗
8Boos W.Gordon A. S.Hall R. E.Price H. D.Transport Properties of the Galactose-binding Protein of Escherichia coli J. Biol. Chem.197224791792410.1016/S 0021-9258(19)45694-94550764 · doi ↗ · pubmed ↗