A Transferable and Robust Computational Framework for Class A GPCR Activation Free Energies

Simone Aureli; Nicola Piasentin; Thorben Fröhlking; Valerio Rizzi; Francesco Luigi Gervasio

PMC · DOI:10.1021/acs.jpclett.5c03834·March 3, 2026

A Transferable and Robust Computational Framework for Class A GPCR Activation Free Energies

Simone Aureli, Nicola Piasentin, Thorben Fröhlking, Valerio Rizzi, Francesco Luigi Gervasio

PDF

Open Access

TL;DR

This paper introduces a new computational method to study GPCR activation, making it easier and more reliable for various receptors.

Contribution

The novel contribution is a streamlined strategy for defining collective variables in molecular simulations of GPCRs.

Findings

01

The new method achieves robust free energy convergence with reduced user intervention.

02

It was successfully applied to ADRB1 and the μ-opioid receptor, yielding consistent and novel insights.

03

The approach is transferable to other class A GPCRs, enabling systematic activation mechanism studies.

Abstract

The activation of G-protein coupled receptors is involved in many biomedically important cellular pathways. However, capturing it with molecular simulations is far from trivial, as it requires capturing both local and global motions. We recently achieved this goal in a specific receptor (the β1-adrenergic receptor, or ADRB1) by combining a multiple replica enhanced sampling approach with tailored collective variables. While that approach can be applied to other receptors, it would require a tedious and error-prone choice and refinement of the collective variables and, in particular, of the main path-like variable. Herein, we introduce an effective and streamlined evolved strategy for defining CVs that reduces user intervention while still achieving a robust free energy convergence. We apply it to two apo-GPCRs of pharmacological relevance, ADRB1 and the μ-opioid receptor. In the first…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

ADRB1

Proteins1

Chemicals1

apo

Figures4

Click any figure to enlarge with its caption.

Schematic comparison between the original PATH CV and the Euclidean variant used in this work. A) In the traditional formulation, constructing a meaningful PATH CV for GPCR activation requires multiple intermediate PDB structures X i along a putative transition pathway. Because these intermediates must be guessed and manually curated, the resulting path is laborious to set up and often uncertain in its direction, leading to repeated trial-and-error attempts. B) In the Euclidean variant, only the inactive and active structures are required, while all intermediate milestones are simply defined in terms of the two-dimensional space of RMSD values relative to these end points. This yields a path that is straightforward to define as it does not depend on any intermediate structure. C) Depiction of a prototypical class A GPCR (ADRB1) embedded into a model membrane. POPC and cholesterol molecules are displayed in dark and light green, respectively. Insets show the most relevant microswitches governing GPCR’s activation.

ADRB1’s activation using the EPATH CV. A) Milestones M i of the EPATH CV employed in apo-ADRB1-Euclidean and in ref . B) 1D FES as a function of the EPATH CV in (A) for the apo-ADRB1-Euclidean system. C) Free-energy difference between inactive and active states in the apo-ADRB1-Euclidean system over time. In (B) and (C), the average data over three replicas are represented through an orange solid line, while the transparent orange area illustrates the standard deviation. D) Average 2D FES concerning the RMSD of both the inactive and active structures for the apo-ADRB1-Euclidean system.

MOR’s activation using either the RPATH or the EPATH CVs. A) Milestones M i of the EPATH CV upon which the apo-MOR-OLD and apo-MOR-Euclidean free energies have been reprojected. B) Comparison of the 1D FES as a function of the EPATH CV in (A) for the apo-apo-MOR-OLD and apo-MOR-Euclidean systems. C,D) Free-energy difference between inactive and active states in the apo-MOR-OLD (C) and apo-MOR-Euclidean (D) systems over time. In (B) and (C), the average free energy difference of apo-MOR-OLD is represented by a purple solid line, while the transparent purple area illustrates the standard deviation. Similarly, in (b) and (d) the average free energy difference of apo-MOR-OLD is represented by a green solid line, while the transparent green area illustrates the standard deviation. E,F) Average 2D FES concerning the RMSD of both the inactive and active structures for the apo-MOR-OLD (E) and apo-MOR-Euclidean (F) systems.

Analysis of the conformational heterogeneity observed in apo-MOR-OLD and apo-MOR-Euclidean OneOPES simulations. A,B) Sampling distributions of the apo-MOR-OLD (A) and apo-MOR-Euclidean. (B) OneOPES simulations projected onto the (RMSD inactive; RMSD active) space. Frames are grouped according to their Euclidean PATH interval (M1–M2, M2–M3, M3–M4, M4–M5, M5–M6, M6–M7), and points within each interval are colored according to the cluster they belong to (gray, blue, red, green, purple, and yellow palettes, respectively). In both panels, a small inset displays the same (RMSD inactive; RMSD active) sampling colored by the local Shannon entropy, providing a complementary view of the conformational heterogeneity within each region of the activation landscape.

Funding3

—Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung10.13039/501100001711
—Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung10.13039/501100001711
—Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung10.13039/501100001711

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReceptor Mechanisms and Signaling · Error Correcting Code Techniques · Diabetes Treatment and Management

Full text

G-protein–coupled receptors (GPCRs) are one of the most important classes of membrane proteins, responsible for transducing extracellular signals into intracellular responses and serving as targets for a large fraction of approved drugs. ?−? ? Their function is tightly linked to conformational changes that span multiple scales, from local rearrangements of side chains and hydration sites to large-scale reorganizations of the transmembrane helices that define the transition between inactive and active states. ?,?

Several simulation strategies have been applied to investigate GPCR activation, ranging from unbiased molecular dynamics (MD) simulations to various enhanced sampling methods. ?−? ? ? ? ? ? ? ? ? ? ? While unbiased MD simulations have revealed interesting details, ?−? ?,?,? receptors’ transitions occur on time scales that remain difficult to access with unbiased MD simulations, a limitation that is commonly addressed by enhanced sampling strategies. ?,?−? ?,?−? ? ? In particular, collective variable (CV)-based approaches aim to accelerate the reaction coordinates associated with the key processes under investigation. ?−? ? ?

Among them, our carefully designed OneOPES framework enables simultaneous acceleration of multiple CVs, offering an efficient way to combine descriptors of both microscopic and mesoscopic motions within a single simulation.? Recently, this strategy has been successfully applied to the activation of β1-adrenergic receptor (ADRB1), where specifically tailored CVs captured both local changes such as side-chain rearrangements and hydration dynamics, as well as the large-scale conformational reorganization of the receptor backbone.?

Within the OneOPES approach, the path collective variable (PATH CV, hereafter) has proven to be a natural and effective choice to represent GPCRs’ substantial backbone rearrangements. First introduced nearly 20 years ago,? the original PATH CV requires the definition of a sequence of representative intermediate structures that bridge two target states, ideally arranged at regular intervals of RMSD. In practice, this setup can be labor-intensive; i.e., multiple manually curated structural hypotheses must be generated, tested, and refined in a trial-and-error process until a stable and physically meaningful path is obtained. This limitation reduces the accessibility and transferability of the method, making it challenging to apply it across multiple receptors or conformational transitions systematically.

Here, we present a robust and transferable strategy that streamlines the construction of PATH CVs for class A GPCRs, substantially reducing the level of manual intervention required while preserving the original accuracy. It is critical to emphasize that this approach requires only knowledge of the two target structures, eliminating the necessity to manually define and optimize a sequence of intermediate structures. We validate the approach on two pharmacologically relevant class A GPCRs in their apo forms, i.e., the β1-adrenergic receptor (ADRB1) and the μ-opioid receptor (MOR). For both systems, the refined PATH CV reproduces the free-energy landscapes obtained with the original tailored PATH CV approach but with shorter simulation times needed to achieve convergence and substantially reduced setup effort. This evolution thus improves both the efficiency and the transferability of the OneOPES sampling protocols for GPCR activation and, more broadly, for other biomolecular systems undergoing large-scale conformational transitions.

Definition of the Euclidean

PATH CV

The original RMSD-based PATH CV formulation,? from now on referred to as RPATH, requires the definition of a sequence of representative intermediate conformations linking inactive and active states. This provides a progress variable s and a deviation variable z defined as follows:

[eqn]

In this formulation, the path is described by a set of N reference structures or milestones X _ i _, and the variables s and z are expressed as functions of the RMSD distances R[X – X _ i _] between the current configuration X and each of these high-dimensional reference frames X _ i _. The parameter λ determines the smoothness of the projection on the reference frames, with high λ values rigidly tying s to the closest single reference frame and producing a typical stepwise behavior in the dynamics, while low λ values spreading out s over a number of reference frames. In enhanced sampling, the sweet spot for λ tends to fall between these two regimes.

Building an RPATH CV requires a careful process of selecting the portion of the system undergoing the conformational change, and choosing and optimizing a number of reference structures. While structures of the end states are usually available, the intermediate structures have to be generated. One typically extrapolates them from biased simulations, such as steered MD or adiabatic bias MD ?,? (see FigureA). These explorative simulations accelerate a reaction coordinate to quickly sample transitions connecting the end states so that a trial sequence of conformations can be extracted from them. However, for large and complex biomolecular systems such as GPCRs, this process is particularly challenging and time-consuming, as it is difficult to both figure out a good reaction coordinate and to evaluate the quality of the extracted structures a priori. Eventually, preliminary biased simulations of the trial RPATH CV must be run to check whether the selected intermediates are sufficient to drive the desired conformational change. Numerous trial-and-error attempts are required before an effective CV is obtained. This recursive procedure is typically not transferable even between analogous systems, and the CV refinement process must be repeated for each new system under investigation.

*Schematic comparison between the original PATH CV and the Euclidean variant used in this work. A) In the traditional formulation, constructing a meaningful PATH CV for GPCR activation requires multiple intermediate PDB structures X

i along a putative transition pathway. Because these intermediates must be guessed and manually curated, the resulting path is laborious to set up and often uncertain in its direction, leading to repeated trial-and-error attempts. B) In the Euclidean variant, only the inactive and active structures are required, while all intermediate milestones are simply defined in terms of the two-dimensional space of RMSD values relative to these end points. This yields a path that is straightforward to define as it does not depend on any intermediate structure. C) Depiction of a prototypical class A GPCR (ADRB1) embedded into a model membrane. POPC and cholesterol molecules are displayed in dark and light green, respectively. Insets show the most relevant microswitches governing GPCR’s activation.*

A further limitation of RPATH CV is the requirement to focus the RMSD calculation on specific portions of the GPCR. Typically one has to align the RMSD over the rigid portion of the GPCR and calculate the RMSD over the structural sections that change the most during the activation. Different alignment strategies on different sets of atoms yield vastly different RMSD measurements, critically impacting the quality of the resulting RPATH CV. This step as well requires a recursive optimization process that further slows down the overall procedure. Although reformulating the path in terms of contact maps has been shown to mitigate this problem by avoiding explicit structural superposition, ?−? ? a contact-map-based path CV remains difficult to build and cannot be transferred between systems.

A different and more straightforward approach was proposed to simulate the activation of the glucagon receptor.? The RMSDs over the two inactive and active end states were calculated and transformed into a CV by simply calculating their difference. The resulting CV captures by construction the progress along the GPCR activation and is equivalent to a linear path between two milestones. However, its lack of curvature represents a limit, as it may force the system to pass through improbable high-energy states where both RMSDs are equally low. In the meantime, an alternative Euclidean formulation of the PATH CV, hereafter referred to as EPATH, was proposed and mostly used in the context of chemistry and catalysis. ?−? ? ? In this variant, the path is not defined by the RMSD value with respect to a sequence of structures but as progress in CV space. In other words, in EPATH the milestones M _ i _ are represented by tuples of CV values and do not need to be tied to real structures.

Here, we introduce a strategy that puts together these two approaches: we propose an EPATH CV on a two-dimensional space that uses the two RMSD values with respect to a GPCR’s inactive and active structures. A crucial component of our strategy is the availability of high resolution structures of the end states over which we measure the RMSD. Along the resulting EPATH CV, any visited structure of the system is encoded in a tuple M defined by a couple of RMSD values (RMSD inactive(M), RMSD active(M)). In this way, only the end state milestones, i.e., M 1 and M 8 in Figure, have to be tied to real structures, while the milestones M _ i _ in between can be generated arbitrarily to chart a route in RMSD space. Here, we choose a slightly arched path above the diagonal that retraces the shape of the path that we observed in our previous work on GPCR activation.? This path connects the well-defined end states by passing through an entropic basin where both RMSD values are high at the same time, that is, a basin populated by states that differ significantly from both the inactive and active references.

Within the EPATH CV approach, the Euclidean distance between the milestone value M of a state and any reference milestone M _ i _ is then simply defined as

[eqn]

Substituting this squared distance into the expressions in eq for s(X) and z(X) simply yields the EPATH CV value for structure M in a path made of N milestones:

[eqn]

This approach retains the definitions of progress (s(M)) and deviation (z(M)) from those of the original RPATH CV approach. Its key advantage is that it avoids the need to define intermediate milestones tied to a curated set of conformations. Consequently, setting up the resulting EPATH CV is much simpler, as the time-consuming recursive refinement procedure of the previous approach is not required. Another advantage of this strategy is that it relies less on RMSD alignment and the selection of atoms used to calculate it. To highlight this, we choose to align and calculate the RMSD over the same set of Cα atoms that make up the secondary structure of the system (more than 90% of the system’s atoms), and this generic choice had no need for refinement for building an efficient CV.

Finally, in terms of performance, the EPATH CV is faster to compute since the numerically expensive RMSD calculation needs to be carried out only twice per time step. Conversely, in the RPATH CV such calculation is computed N times per time step, once for each intermediate structural milestone. These advantages make applying the EPATH strategy to multiple systems more straightforward, reducing setup time, and improving transferability.

OneOPES MD Simulations

The equilibrated structure of ADRB1 was taken from our previous work,? where we generated it starting from the antagonist-bound inactive structure within PDB ID: 7BVQ.? The structure of MOR was obtained from the inactive MOR structure within PDB ID: 9MQJ by removing the ligand resolved in the orthosteric binding site.? The reference active structures of ADRB1 and MOR were retrieved from PDB ID: 7BTS ? and PDB ID: 8F7R,? respectively. The details of the preparation and equilibration of MOR and ADRB1 are detailed in the Supporting Information and in Figure S1.

We used OneOPES,? a replica-exchange extension of OPES Explore,? to sample the activation pathway of two GPCRs, ADRB1 and MOR. In each simulation, eight replicas were run, i.e., one convergence-focused (replica 0) and seven exploratory ones (replicas 1–7) where additional CVs were accelerated. All replicas shared OPES Explore as the main sampling bias, while replicas 1–7 were progressively heated up to 335 K by biasing the system’s potential energy with OPES Expanded? to facilitate barrier crossing.

To directly compare the two approaches, we ran simulations using the GPCR activation PATH CV in both formulations, i.e., the original RPATH? and the newer EPATH implementation. In analogy with ref ?, additional CVs were introduced for four key microswitch motifs,? i.e., PIF, DRY, NPxxY, and YY, to capture local structural determinants of activation. The distances within these motifs were biased as auxiliary CVs in the exploratory replicas. Both approaches use a similar set of parabolic restraints on the microswitches PIF and NPxxY to guide the system through the intermediate basins, which contain many metastable states and kinetic traps, to the fully active one. It is important to highlight that these parabolic restraints only apply to intermediates and not to end-point states, and as such they do not affect their relative stability.

In addition, we biased the hydration of the cytoplasmic cavity, as our recent studies have shown that water coordination at specific sites can be critical for lowering the activation free-energy barrier. ?−? ? ? ? ? ? In the case of ADRB1, this choice is further supported by NMR evidence indicating that the YY-motif is stabilized through a water-mediated interaction.?

For analyzing and comparing results, directly using the biased RPATH and EPATH CVs would be rather ineffective. We chose instead to reproject the free energy results on an alternative simplified EPATH CV with fewer milestones, six for ADRB1 and seven for MOR. Further analyses and figures on the biased PATH CVs are instead reported in the Supporting Information.

Simulations were performed using GROMACS 2023? patched with PLUMED 2.9.1. ?,? Full details about simulation parameters and CVs are reported in the Supporting Information, respectively in Tables S1 and S2 and Figures S2 and S3, and the simulation input files are provided in GitHub at https://github.com/valeriorizzi/GPCR_Euclidean_PATH.git.

Cluster Analysis and Shannon’s Entropy

We performed a cluster analysis on the MOR’s OneOPES simulations using GROMACS’s gmx cluster routine, using the gromos algorithm. An RMSD threshold value of 1.4 Å on the secondary structure Cαs was selected (the same selection of Cαs we used for RMSD_ inactive _ and RMSD_ active _), considering the number of generated cluster families and the similarity of protein conformations within a cluster family. From the cluster population, the Shannon’s entropy was calculated, using the following formula:

[eqn]

where k B is the Boltzmann constant, p _ i _ is the fractional population of the i-th cluster with respect to all the cluster families C and $[eqn]$ . In the analysis, once we partition the phase space into different regions, we use this metric as a natural measure of configurational diversity in each region. Regions with one dominant cluster family would display low entropy values, whereas regions with numerous equally probable cluster families would have high entropy content.

Validation of the Euclidean PATH CV on ADRB1

We first assessed the performance of the EPATH CV on ADRB1, a prototypical class A GPCR that has been widely investigated both experimentally and computationally. ?,?,?,? Using the same simulation setup and CV selection of our previous study,? we repeated OneOPES simulations of the apo-ADRB1 system, replacing the RPATH CV with the newly constructed one (see FigureA and Figure S4A). The resulting free-energy landscape along the activation pathway closely matches our earlier results, recovering both the relative stability of the inactive and active basins and the barrier separating them (FigureB). Notably, the simulations converge in significantly shorter time scales than with the RPATH CV, highlighting the efficiency gain of the EPATH approach without compromising accuracy (see FigureC and Figure S4B,C).

ADRB1’s activation using the EPATH CV. A) Milestones M i of the EPATH CV employed in apo-ADRB1-Euclidean and in ref . B) 1D FES as a function of the EPATH CV in (A) for the apo-ADRB1-Euclidean system. C) Free-energy difference between inactive and active states in the apo-ADRB1-Euclidean system over time. In (B) and (C), the average data over three replicas are represented through an orange solid line, while the transparent orange area illustrates the standard deviation. D) Average 2D FES concerning the RMSD of both the inactive and active structures for the apo-ADRB1-Euclidean system.

As a further form of comparison, we reprojected the accumulated bias potential onto the (RMSD inactive; RMSD active) plane, obtaining a two-dimensional free-energy surface (FES) describing the conformational change leading to ADRB1’s activation (FigureD). This representation confirms that the 2D FES obtained with the EPATH is in excellent agreement with the one previously computed using the RPATH, with both inactive- and active-like basins recovered at the expected locations. Additional analyses of cavity hydration and the conformational rearrangements of key microswitches during ADRB1 activation are provided in the Supporting Information (see Figure S5) and show a good quality agreement between the two approaches.

Activation of the Mu Opioid Receptor

In this section, we report the simulation results of the activation of another class A GPCR: the μ-opioid receptor (MOR). MOR is one of the most critical analgesic targets and is at the center of the ongoing opioid crisis. ?,? We ran OneOPES with both the RPATH and EPATH CV definitions. The former was extrapolated from a series of steered MD between representative inactive and active structures of MOR (for further details, refer to the Supporting Information and Figure S6), while the latter was defined through (RMSD inactive; RMSD active) tuples, following the same arched milestones distribution employed for ADRB1 (see FigureA and Figure S7). To deliver reliable statistics, each one-OPES simulation was carried out in triplicate. For clarity’s sake, the apo-MOR system sampled with the RPATH CV is referred to as apo-MOR-OLD hereafter, while the apo-MOR system sampled with the new EPATH CV as apo-MOR-Euclidean.

MOR’s activation using either the RPATH or the EPATH CVs. A) Milestones M i of the EPATH CV upon which the apo-MOR-OLD and apo-MOR-Euclidean free energies have been reprojected. B) Comparison of the 1D FES as a function of the EPATH CV in (A) for the apo-apo-MOR-OLD and apo-MOR-Euclidean systems. C,D) Free-energy difference between inactive and active states in the apo-MOR-OLD (C) and apo-MOR-Euclidean (D) systems over time. In (B) and (C), the average free energy difference of apo-MOR-OLD is represented by a purple solid line, while the transparent purple area illustrates the standard deviation. Similarly, in (b) and (d) the average free energy difference of apo-MOR-OLD is represented by a green solid line, while the transparent green area illustrates the standard deviation. E,F) Average 2D FES concerning the RMSD of both the inactive and active structures for the apo-MOR-OLD (E) and apo-MOR-Euclidean (F) systems.

The free-energy results obtained from the two approaches are in excellent agreement, with both well capturing the thermodynamic equilibrium between inactive and active-like states. The agreement is evident from the one-dimensional free energy obtained by reweighting the accumulated bias potential of apo-MOR-OLD onto apo-MOR-Euclidean’s EPATH CV (see FigureB). However, the RPATH CV takes a rather long time to converge, with the free energy difference reaching an agreement between the independent replicas only at the end of the simulation (see FigureC). This indicates that, even though the CV itself went through an iterative optimization process, it is still far from ideal, and further refinements are necessary.

On the contrary, the EPATH CV once again converges faster than the RPATH, requiring a much shorter simulation time to achieve comparable estimates of the free-energy difference among the three independent replicas (see FigureD). Remarkably, no recursive CV optimization was needed to refine this PATH CV. In detail, for apo-MOR-OLD, the ΔG of activation is estimated to be 18.0 ± 1.0 kcal/mol after ∼800 ns, whereas apo-MOR-Euclidean’s OneOPES simulations converge to the ΔG value of ∼18.0 kcal/mol after only 200 ns. Similar considerations apply to the 2D FES measured on the (RMSD inactive; RMSD active) space: both sets of simulations are able to explore an analogous conformational space (see FigureE,F), with the apo-MOR-Euclidean trajectory being able to sample more thoroughly the fully active basin (RMSD active < 0.1 nm; see Figure S7 for more details).

To further assess and compare the sampling quality of the two approaches, we additionally analyzed the rearrangement of key MOR microswitches along the activation coordinate. Specifically, we computed 2D FESs as a function of the activation pathway and structural descriptors associated with the NPxxY, DRY, and YY motifs. In all cases, the apo-MOR-OLD and apo-MOR-Euclidean simulations delivered similar landscapes, capturing both the qualitative ordering of microswitch transitions and the quantitative positioning of the free-energy minima (see Figure S8). We also calculated a 2D FES to characterize the hydration dynamics of the intracellular cavity in the proximity of the YY motif, a well-known hallmark of GPCR activation. This analysis reveals again a strong agreement between the two PATH CV definitions, with both approaches identifying analogous hydration patterns and their coupling to the receptor’s progression along the activation pathway (Figure S9). Finally, we examined the rearrangement of the MOR pseudoion lock,? focusing on the residue pairs R^3.50^–T^6.34^ and V^3.54^–^L6.30^. Both approaches consistently capture an asynchronous disruption of these interactions along the activation pathway, with the loss of the V^3.54^–L^6.30^ hydrophobic contact occurring at early activation stages, followed by the breakage of the R^3.50^–T^6.34^ H-bond at later stages. This behavior, which is consistent with the progressive outward motion of TM6, is reproduced by both apo-MOR-OLD and apo-MOR-Euclidean simulations, further supporting the equivalence of the two PATH CV constructions (Figure S10).

Advantages of the Euclidean PATH

To better understand how EPATH effectively drives GPCR activation, we analyzed the distribution of sampled configurations across the two collective variables, s(M) and z(M). In EPATH, the intermediate values of the s(M) coordinate correspond to conformations that are simultaneously distant from both the inactive and active references in the RMSD space. It is worth stressing that RMSD is a spherical yet anisotropic descriptor: a high RMSD value corresponds to a multitude of structurally distinct states equally distant from the reference state.? For this reason, using RMSD alone as a CV in enhanced sampling is not particularly effective and may lead to poorly converged results.

In contrast, the EPATH CV couples two RMSD measures, thereby embedding directionality throughout its range in a push–pull fashion. This bivariate description enables detours away from the end states while avoiding the vast and uninformative region of high-RMSD configurations where simulations typically suffer from hysteresis. An important role in this context is played by the presence of loose restraints on the path deviation z and of coupled restraints between the progress CV s and microswitches PIF and NPxxY. This helps the convergence by aligning the conformation of the microswitches with the activation of the macroswitches. Another important element is the presence of the membrane that confines the GPCR and prevents it from exploring high-in-energy partially unfolded states.

Importantly, EPATH does not require a tedious definition of all of the intermediate milestones. Instead, in RPATH the definition of these milestones is crucial. The milestones are not known a priori and are typically extracted from steered MD trajectories. A suboptimal choice of the intermediate milestones decreases the efficiency of the method by forcing the sampling of conformations that are far from the RPATH. EPATH has fine resolution near the end points (where the corresponding structures and energy basins are well-defined), while remaining less well-defined in the middle, where the receptor naturally explores a broader ensemble. When combined with OneOPES, and auxiliary biases on the microswitches and water, this transition state region is crossed rapidly and reversibly, ensuring faster convergence than that with RPATH.

This behavior is clearly illustrated in Figure. In both apo-MOR-OLD (see FigureA) and apo-MOR–Euclidean (see FigureB), the intermediate portions of the EPATH are characterized by a plethora of slightly out-of-path frames (mostly between s(M) ∼ 4 and s(M) ∼ 5). Performing a cluster analysis on the trajectory frames confirms that the intermediate portion of the EPATH encompasses distinct structural families (see Figures S10 and S11 for further information). Notably, the simulation frames belonging to the first and last milestones (i.e., M 1 and M 7) can be grouped into very few cluster families, with the most probable ones containing more than 95% of the cluster populations. Conversely, the more we approach the central values of the EPATH, the more we can appreciate both an increase in cluster families and a decrease in their relative abundance.

Analysis of the conformational heterogeneity observed in apo-MOR-OLD and apo-MOR-Euclidean OneOPES simulations. A,B) Sampling distributions of the apo-MOR-OLD (A) and apo-MOR-Euclidean. (B) OneOPES simulations projected onto the (RMSD inactive; RMSD active) space. Frames are grouped according to their Euclidean PATH interval (M1–M2, M2–M3, M3–M4, M4–M5, M5–M6, M6–M7), and points within each interval are colored according to the cluster they belong to (gray, blue, red, green, purple, and yellow palettes, respectively). In both panels, a small inset displays the same (RMSD inactive; RMSD active) sampling colored by the local Shannon entropy, providing a complementary view of the conformational heterogeneity within each region of the activation landscape.

To quantitatively characterize how the size of the conformational pool varies along the progress coordinate, we turned to information theory and computed the Shannon entropy associated with the cluster populations at each milestone (see the paragraph “Cluster Analysis and Shannon’s Entropy” for additional details). Consistent with our qualitative observations, both apo-MOR-OLD and apo-MOR-Euclidean exhibit the same global behavior: entropy reaches its minimum at the terminal milestones, where the conformational space is tightly focused around well-defined inactive and active structures and rises sharply toward the center of the path (see FigureA,B). Remarkably, in both formulations, the entropy in the M 4–M 5 interval increases by nearly 3 orders of magnitude relative to the end points, quantitatively confirming that the intermediate milestones host a vastly broader and more diverse ensemble of receptor conformations. This information-theoretic analysis reinforces the picture emerging from clustering and structural projections: the EPATH naturally captures the physical widening of the conformational landscape in the transition region, while maintaining strong definition at the two end points.

In conclusion, GPCRs are central regulators of cellular signaling and remain the targets of a large fraction of therapeutic compounds. A key determinant of their pharmacological action is the balance between inactive and active conformational states, which underlies the agonist–antagonist axis. Obtaining reliable free-energy differences between inactive and active reference states is therefore a pivotal step toward comparing receptor activation thermodynamics and assessing ligand effects. However, the inherent complexity of GPCR conformational landscapes poses major challenges for molecular simulations, requiring strategies that are accurate, practical, and transportable. ?,?

In this work, we introduced a streamlined protocol for building PATH CVs within the OneOPES enhanced sampling framework. By eliminating the tedious trial-and-error procedure of the standard path definition, the new approach reduces the user effort required to set up simulations while maintaining the accuracy of the computed free-energy landscapes. Its application to two prototypical class A GPCRs,? ADRB1 and MOR, demonstrates that this approach reproduces the thermodynamic results of the traditional method with a significantly faster convergence. An in-depth analysis of MOR further confirmed that the mechanistic details of activation, including side-chain microswitches and hydration changes, are faithfully captured. Additional analyses reported in the Supporting Information further show that alternative PATH definitions, including a straight Euclidean path directly connecting inactive and active reference states, lead to consistent but less efficient sampling, supporting the robustness of the proposed construction (see Figure S13).

An important question is whether providing additional information about the intermediate region of the activation landscape could further improve the approach. In our view, a balance must be struck between the ease of use and computational efficiency. For in-depth mechanistic studies of a single receptor, more elaborate CV refinement strategies may indeed offer advantages. However, when the goal is to analyze multiple GPCRs and to reliably estimate activation free energies without extensive system-specific optimization, the Euclidean PATH approach presented in this paper provides a robust, easy-to-build solution that delivers reliable sampling between end states.

Nonetheless, it is also important to acknowledge the limitations of RMSD-based metrics. As previously stated, at large RMSD values, the number of distinct conformations grows rapidly, making such regions difficult to sample and ineffective to bias. The role of these restraints and their impact on free-energy convergence is demonstrated in the Supporting Information through simulations that employ looser or absent microswitch restraints. Moreover, the real reaction coordinate involves both the position of the helices (macroswitches) tracked by the RMSD and the state of the microswitches, which is not directly included in the geometric path. The efficiency of the present approach stems largely from the constrained conformational space imposed on the intermediate states of the microswitches by mild restraining potentials. This restricts the exploration to the “relevant” microswitch states, thereby accelerating the convergence of the free energies. The role of these restraints and their impact on free-energy convergence is demonstrated in the Supporting Information through simulations that employ looser or absent microswitch restraints (see Figures S14 and S15). Systems that experience broader structural rearrangement embedded in a less constrained environment, e.g., a globular protein in water solution, may demand additional care, as the effective size of RMSD space increases and sampling may become a major challenge. Another point to bear in mind is that the estimated barriers under these conditions will not be as reliable as the difference between the end-point (inactive and active) states, while the shape of the 1D free energy profile is clearly affected by the definition of the path.

Overall, this method is more general and easier to use than previous proposals and therefore represents an important step toward routinely using enhanced sampling approaches to study the activation of class A GPCRs. Our approach enhances efficiency and accessibility, paving the way for future investigations that will quantify the effects of ligands or mutations on GPCR conformational equilibria, offering valuable insights for drug discovery and pharmacological design.

Supplementary Material

Bibliography64

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Rask-Andersen M.Almén M. S.Schiöth H. B.Trends in the exploitation of novel drug targets Nat. Rev. Drug Discovery 20111057959010.1038/nrd 347821804595 · doi ↗ · pubmed ↗
2Katritch V.Cherezov V.Stevens R. C.Structure-function of the G protein–coupled receptor superfamily Annual review of pharmacology and toxicology 20135353155610.1146/annurev-pharmtox-032112-135923 PMC 354014923140243 · doi ↗ · pubmed ↗
3Sriram K.Insel P. A.G protein-coupled receptors as targets for approved drugs: how many targets and how many drugs?Molecular pharmacology 20189325125810.1124/mol.117.11106229298813 PMC 5820538 · doi ↗ · pubmed ↗
4Weis W. I.Kobilka B. K.The molecular basis of G protein–coupled receptor activation Annual review of biochemistry 20188789791910.1146/annurev-biochem-060614-033910 PMC 653533729925258 · doi ↗ · pubmed ↗
5Venkatakrishnan A.Deupi X.Lebon G.Tate C. G.Schertler G. F.Babu M. M.Molecular signatures of G-protein-coupled receptors Nature 201349418519410.1038/nature 1189623407534 · doi ↗ · pubmed ↗
6Dror R. O.Green H. F.Valant C.Borhani D. W.Valcourt J. R.Pan A. C.Arlow D. H.Canals M.Lane J. R.Rahmani R.Structural basis for modulation of a G-protein-coupled receptor by allosteric drugs Nature 201350329529910.1038/nature 1259524121438 · doi ↗ · pubmed ↗
7Huang W.Manglik A.Venkatakrishnan A. J.Laeremans T.Feinberg E. N.Sanborn A. L.Kato H. E.Livingston K. E.Thorsen T. S.Kling R. C.Structural insights into μ -opioid receptor activation Nature 201552431532110.1038/nature 1488626245379 PMC 4639397 · doi ↗ · pubmed ↗
8Latorraca N. R.Venkatakrishnan A. J.Dror R. O.GPCR Dynamics: Structures in Motion Chem. Rev.201711713915510.1021/acs.chemrev.6b 0017727622975 · doi ↗ · pubmed ↗