Universality of fold-encoded localized vibrations in enzymes

Yann Chalopin; Francesco Piazza; Svitlana Mayboroda; Claude Weisbuch,; Marcel Filoche

arXiv:1902.09939·physics.bio-ph·February 27, 2019

Universality of fold-encoded localized vibrations in enzymes

Yann Chalopin, Francesco Piazza, Svitlana Mayboroda, Claude Weisbuch,, Marcel Filoche

PDF

TL;DR

This study reveals that enzymes universally utilize localized, structure-encoded vibrations in the picosecond range to facilitate catalysis, providing a microscopic understanding of enzyme efficiency across diverse structures.

Contribution

The paper introduces a mathematical framework demonstrating the universality of fold-encoded localized vibrations in enzymes, linking structure to catalytic function.

Findings

01

Localized vibrations are optimally coupled to reaction coordinates.

02

Universality demonstrated across over 900 enzyme structures.

03

Provides microscopic rationale for active site compactness.

Abstract

Enzymes speed up biochemical reactions at the core of life by as much as 15 orders of magnitude. Yet, despite considerable advances, the fine dynamical determinants at the microscopic level of their catalytic proficiency are still elusive. In this work, we use a powerful mathematical approach to show that rate-promoting vibrations in the picosecond range, specifically encoded in the 3D protein structure, are localized vibrations optimally coupled to the chemical reaction coordinates at the active site. The universality of these features is demonstrated on a pool of more than 900 enzyme structures, comprising a total of more than 10,000 experimentally annotated catalytic sites. Our theory provides a natural microscopic rationale for the known subtle structural compactness of active sites in enzymes.

Tables1

Table 1. Table 1: Comparison between Normal modes (NMA) and localization landscape (LL) analyses.

	CPU time [s]
$#$ of d.o.f.	NMA	LL	Ratio NMA/LL
500	0.72	0.0032	22
1000	4.6	0.17	27
2000	40	1	40
5000	840	18	47
10000	6600	132	50
20000	54000	571	100

Equations32

k_{t} \propto e^{- β (Δ G + λ)^{2} /4 λ} \int e^{- S_{G} (R) /2ℏ} P_{e} (R) d R

k_{t} \propto e^{- β (Δ G + λ)^{2} /4 λ} \int e^{- S_{G} (R) /2ℏ} P_{e} (R) d R

V = \frac{1}{2} i > j \sum K_{ij} (r_{ij} - R_{ij})^{2},

V = \frac{1}{2} i > j \sum K_{ij} (r_{ij} - R_{ij})^{2},

K_{ij} = k c_{ij}

K_{ij} = k c_{ij}

V = \frac{1}{2} ij \sum α β \sum H_{ij}^{α β} u_{i α} u_{j β} + O (u^{3})

V = \frac{1}{2} ij \sum α β \sum H_{ij}^{α β} u_{i α} u_{j β} + O (u^{3})

H_{ij}^{α β}

H_{ij}^{α β}

H = M^{- 1/2} H M^{- 1/2}

H = M^{- 1/2} H M^{- 1/2}

M_{i} \overset{u}{¨}_{i α} = - j β \sum H_{ij}^{α β} u_{j β}

M_{i} \overset{u}{¨}_{i α} = - j β \sum H_{ij}^{α β} u_{j β}

\ddot{X} = - H X

\ddot{X} = - H X

H Y^{n} = ω_{n}^{2} Y^{n}

H Y^{n} = ω_{n}^{2} Y^{n}

u_{i α} (t) = \frac{X _{i α} ( t )}{M _{i}} = \frac{1}{M _{i}} n = 1 \sum 3 N α_{n} Y_{i α}^{n} e^{- j ω_{n} t} .

u_{i α} (t) = \frac{X _{i α} ( t )}{M _{i}} = \frac{1}{M _{i}} n = 1 \sum 3 N α_{n} Y_{i α}^{n} e^{- j ω_{n} t} .

H_{c} U = 1,

H_{c} U = 1,

H_{c, ij}^{α β} = {c - H_{ij}^{α β} H_{ij}^{α β} if i = j, α = β otherwise .

H_{c, ij}^{α β} = {c - H_{ij}^{α β} H_{ij}^{α β} if i = j, α = β otherwise .

U_{i} = (α \in x, y, z \sum U_{i α} U_{i α})^{1/2}

U_{i} = (α \in x, y, z \sum U_{i α} U_{i α})^{1/2}

H Y^{n} = ω_{n}^{2} Y^{n}

H Y^{n} = ω_{n}^{2} Y^{n}

\hat{L} U = 1,

\hat{L} U = 1,

C_{i} = \frac{1}{N _{S} c _{i}} n \in S \sum j \sum c_{ij} R_{ij} - (α = x, y, z \sum (R_{ij}^{α} + a (Y_{i α}^{n} - Y_{j α}^{n}))^{2})^{1/2},

C_{i} = \frac{1}{N _{S} c _{i}} n \in S \sum j \sum c_{ij} R_{ij} - (α = x, y, z \sum (R_{ij}^{α} + a (Y_{i α}^{n} - Y_{j α}^{n}))^{2})^{1/2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Universality of fold-encoded localized vibrations in enzymes

Yann Chalopin

Laboratoire d’Energétique Macroscopique et Moléculaire, Combustion (EM2C), CentraleSupélec, CNRS, 91190 Gif-sur-Yvette, France

Francesco Piazza

Centre de Biophysique Moléculaire (CBM) CNRS UPR4301 $\&$ Université d’Orléans, Orléans 45071, France

Svitlana Mayboroda

School of Mathematics, University of Minnesota, Minneapolis, Minnesota 55455, USA

Claude Weisbuch

Laboratoire de Physique de la Matière Condensée, Ecole Polytechnique, CNRS, 91128 Palaiseau, France

Materials Department, University of California, Santa Barbara, California 93106, USA

Marcel Filoche

Laboratoire de Physique de la Matière Condensée, Ecole Polytechnique, CNRS, 91128 Palaiseau, France

Abstract

Enzymes speed up biochemical reactions at the core of life by as much as 15 orders of magnitude. Yet, despite considerable advances, the fine dynamical determinants at the microscopic level of their catalytic proficiency are still elusive. In this work, we use a powerful mathematical approach to show that rate-promoting vibrations in the picosecond range, specifically encoded in the 3D protein structure, are localized vibrations optimally coupled to the chemical reaction coordinates at the active site. The universality of these features is demonstrated on a pool of more than 900 enzyme structures, comprising a total of more than 10,000 experimentally annotated catalytic sites. Our theory provides a natural microscopic rationale for the known subtle structural compactness of active sites in enzymes.

pacs:

I Introduction

The intricate networks of metabolic cascades that power living organisms ultimately rest on the exquisite ability of enzymes to increase the rate of chemical reactions by many orders of magnitude. However, despite a large body of evidence accumulated over the past two decades in favor of the highly dynamical nature of proteins, the question whether protein motions such as conformational changes and finer (and faster) reorganization dynamics play a role in enzyme catalysis remains widely debated Nagel and Klinman (2009).

Although many molecular machines contain intrinsically disordered domains Oldfield and Dunker (2014), the 3D fold is central to enzyme functioning. In particular, increasing evidence is accumulating in the literature in favor of the existence of specific fold-encoded motions believed to be optimally coupled to the chemical reaction coordinate(s) Zinovjev and Tuñón (2017); Kale et al. (2008); Agarwal (2005); Antoniou and Schwartz (2001); Hay and Scrutton (2012); Luk et al. (2013); Nagel and Klinman (2009). These motions typically correspond to localized vibrations of the protein scaffold that contribute to the catalytic reaction, i.e. modes that, if impeded, would lead to a deterioration of the catalytic efficiency Nagel and Klinman (2009). The existence and importance of such localized, shape-specific motions, coined rate-promoting vibrations (RPV) Antoniou and Schwartz (2001) is backed by many computational and experimental studies Kale et al. (2008); Pudney et al. (2009); Heyes et al. (2009, 2011); Henzler-Wildman et al. (2018); Saen-Oon et al. (2008); Masterson et al. (2010); Agarwal et al. (2002), beginning with the pioneering ideas by McClare on the functional role of non-equilibrium localized motions in muscle contraction McClare (1972). The role of RPVs in enzymes has been highlighted for the tunneling reaction coordinate in lactate dehydrogenase (LDH) Chen and Schwartz (2018); Dzierlenga and Schwartz (2016); Quaytman and Schwartz (2007). Promoting modes in Purine Nucleosidase phosphorylase (PNP) have also been explored more recently Harijan et al. (2017). Interestingly, evidence for the existence of promoting vibrations coupling directly to the reaction coordinate in enzyme-catalyzed hydrogen transfer reactions has also been gathered from the temperature dependence of kinetic isotope effect (KIE) Arcus and Pudney (2015). More generally, the key rate-promoting role of fluctuations in the region of the active site has been established on rigorous quantum mechanical grounds in the 1990s by Bruno and Bialek for enzymatic hydrogen transfer Bruno and Bialek (1992). Yet, despite the broad set of evidence for specific dynamical effects in enzymes-catalyzed reactions, a universal demonstration of the existence of RPVs in enzymes that could explain how specific vibrations at the active site contribute to increase the reaction rate is still lacking.

To tackle the problem of assessing the role of vibrations in the catalytic efficiency of enzymes, it is essential to understand that in general protein motions play a rather diverse and subtle role over a wide range of timescale and distances McCammon and Harvey (1987). The longest times, which correspond to conformational changes of the protein, are in the ms-s range Wolf-Watz et al. (2004) and are generally believed not to be directly coupled to the enzymatic catalytic step, as most enzymes have turnover rates in the $10^{3}$ s*-1* ballpark Nagel and Klinman (2009). The matter is subtler for allosteric transitions (i.e., action at a distance) Changeux and Edelstein (2005), and slow conformational sampling, occurring in the ms-s timescale too, with many studies advocating a variable degree of coupling of those motions to the chemical step Hammes (2002); Gerhart and Schachman (1968), including the key advances brought about by single-molecule enzymology English et al. (2006); Lu, H. P. Luying Xun (1998). Faster conformational sampling in the ns-ms and faster reorganization motions of the active sites in the ps-ns range are commonly accepted to play an important role in shaping the kinetic behavior of many enzymes, such as alcohol dehydrogenase Liang et al. (2004) and methylamine dehydrogenase Basran et al. (1999), as most clearly revealed by the pioneering studies on the role of protein motions in hydrogen tunneling in soybean lypoxygenase-1 Liang et al. (2004); Knapp et al. (2002).

Quantum-mechanical tunneling in hydrogen transfer at room temperature was first demonstrated in 1989 in a seminal paper on alcohol dehydrogenase Cha et al. (1989). In particular, this discovery revealed the tremendous power of kinetic isotope effects studies to investigate the direct coupling of fast vibrational modes localized at the active site to the catalytic step Klinman and Kohen (2013). The general surprising finding is that the KIE is largely temperature-independent in many native enzyme systems Knapp et al. (2002); Pudney et al. (2009); Basran et al. (1999). This is usually interpreted as the blueprint of an optimal structural compactness at the active site, where reaction partners are kept tight in the optimal geometry that underlies the catalytically competent atomic arrangement. This fact perfectly rhymes with the known reports that active sites tend to lie in the stiffest regions of enzyme structures Sacquin-Mora et al. (2007); Juanico et al. (2007); Aubailly and Piazza (2015) and that a subtle balance of rigidity and some specific flexibility are implied in enzyme catalysis Kamal et al. (2012); Guo et al. (2012).

Taken together, the above facts lead to an emerging picture where enzymes feature highly compact, pre-organized active sites. These represent structurally competent catalytic precursors that are generically modulated through slow conformational sampling at the level of the whole structure, but more finely and specifically regulated by specific rate-promoting vibrations that couple directly to the reaction coordinate(s). Hydrogen tunneling kinetics provides the perfect grounds for illustrating these ideas. There is now a wide consensus that donor-acceptor distances (DAD) at the active site for enzymes that catalyze the transfer of some hydrogen species are modulated with sampling frequencies in the $50-300\text{\,}\mathrm{c}\mathrm{m}^{-1}$ range Klinman and Kohen (2013), corresponding to motions in the ps-ns range. These RPVs provide optimal compression along the DAD, thus enhancing the tunneling rate through a vibrationally assisted mechanism Klinman and Kohen (2013); Bruno and Bialek (1992). In other words, fast conformational sampling along the DAD is optimal in the substrate-bound conformation, which generates active-site compression leading to favorable close approach between donor and acceptor atoms Klinman and Kohen (2013) on timescales slower than tunneling times (fs).

In this paper, we take one step forward and show that fold-specific, localized vibrations enforcing dynamical compression at the active site are a universal feature of enzymes. This suggests that enzymes structures have evolved as optimally designed mechanical transducers of vibrational energy mediated by RPV patterns Heyes et al. (2011). The article is organized as follows: first, we introduce the localization landscape (LL), a novel and powerful mathematical tool which we use here to predict the spatial distribution of energy in proteins modeled by the Elastic Network Model (ENM) Atilgan et al. (2001). The implementation of the LL is illustrated for a specific example, the well-documented case of LDH, before reporting the results of a systematic study of the correlation between active sites and localized vibrations on a sample of about 1,000 enzymes (corresponding to more than 10,000 annotated actives sites).

II Materials and Methods

In order to investigate the topological origin of vibrational modes related to the active-site reorganization in the specific timescale of interest (100 $cm^{-1}$ ), we adopt a coarse-grained elastic-network model (ENM) Tirion (1996); Atilgan et al. (2001); Juanico et al. (2007) (see Appendix A). This model reduces each protein to a collection of beads and springs that interact according to a unique, fold-encoded connectivity pattern. In our case, the beads correspond to the amino acids centered at the $C_{\alpha}$ carbon of the tertiary structure (Fig. 1A). Enzymes are therefore seen as a set of coupled harmonic oscillators (Fig. 1B). As it is well known Bahar and Cui (2005), the local connectivity of each each amino acid is reflected in the sparsity pattern of the force constant matrix (Fig. 1C). This connectivity controls the localization pattern of high-frequency modes ( $\mathcal{O}(10^{2})$ cm*-1* in $C_{\alpha}$ -based ENM schemes).

The main idea of this paper is to use a novel mathematical tool, coined localization landscape (LL), to decipher the subtle structure-dynamics-function relation in enzymes. The LL, which rests on a universal theory of wave localization, unveils the localization pattern of standing waves in complex or disordered media Filoche and Mayboroda (2012), and is extended here to the case of protein vibrations (see Appendix A). Bypassing the need to compute the full set of normal modes, the LL is a real-valued function computed at each site of the ENM network by solving a simple linear system based on the force constant matrix (see Appendix B). This LL provides the essential information about the interplay between the complex protein shape and the propagation of microscopic vibrations. In particular, the “valleys” of the LL delineate the main regions of existence of the large-amplitude localized vibrations, thus yielding an effective functional partition of the molecule structure. In addition, the local maxima of the LL identify the most localized vibrating areas or “hot spots”, while the corresponding values of the LL at these hot spots are very good predictors of the associated vibration frequencies Lefebvre et al. (2016); Arnold et al. (2018) (see also appendix B and more specifically Fig. 9). We emphasize here that the LL is about 50 times faster to compute than solving the full eigenvalue problem (see Table 1 in Appendix C).

The LL reveals that the molecular architecture of enzymes seems so designed as to concentrate high-frequency vibrations within a few domains, as it has been pointed out in previous studies Aubailly and Piazza (2015); Yang and Bahar (2005); Lyra et al. (2015); Sacquin-Mora et al. (2007). Moreover, the LL affords considerable new insight into how the localization pattern also segments the molecular scaffold into nearly vibrationally independent (i.e., uncoupled) clusters of amino acids. Although this work focuses on enzymes, the localization property seems to remain general for every protein.

III Results

The case of LDH is presented here as a paradigmatic example to illustrate the insight offered by our method. As a comparison, we first compute all the normal modes (NM) by brute-force diagonalization of the dynamical matrix. The patterns of the highest-frequency NMs (Fig. 2A) reveal that they are highly confined to some very specific residues. We then compute the high-frequency LL of the enzyme (indicated by $u$ in Fig. 2B). The most interesting property of the LL appears when comparing it to the catalytic structure of the enzyme, characterized by the locations of the known active sites of LDH (VAL-31, GLY-32, MET-33, LEU-65, GLN-66). Clearly, catalysis in LDH takes place in the regions where the fast vibrations of amino acids are preferentially concentrated.

The structure of the localization pattern appears even more clearly when color-coding it onto the 3D conformations (Fig. 3), thus identifying unmistakably two distinct regions in the molecule where fast vibrations are concentrated. We observe here that peaks (hot spots) of the localization landscape that appear distant when plotted along the backbone chain (Fig. 2B) are found around the same spatial locations (here, the two red spots in Fig. 3). We also find that the few peaks of the localization landscape that do not seem to correspond to any active site are in fact found in the same regions, once the backbone chain is folded into its tertiary structure. This observation applies very generally to all LLs computed for a very large set of enzymes (see Fig. 7 in the following).

A careful analysis of the spatial structure of localized modes reveals that high-frequency localized vibrations are compressive motions. Hence, at hot-spots, amino acids tend to get close-packed. This feature is demonstrated here by computing at each residue the reduction of the mean distance between nearest neighbors induced by the highest-frequency modes (see Appendix D). Figure 4 displays the result of this computation in the case of LDH: we clearly see that localization hot spots match almost exactly the regions subjected to compression motions of large magnitude. A more detailed analysis of a localized mode is presented in Fig. 5 (the example shown in the figure is the eigenvector #10).

An important additional feature revealed by the LL analysis of LDH is that the enzyme structure appears to be partitioned into large-scale domains, i.e., contiguous sets of sites separated by deep minima of the landscape (Fig. 6A). These domains comprise few hundreds of amino acids associated with the oligomeric complexes (monomer, dimer, trimer etc.). Each of these domains exhibits a sub-structure comprising 2 to 4 regions of a few tens of sites that harbor the most localized vibrations. From the LL, we can define each domain as comprising a hot-spot and extending to the two lowest local minima on both sides along the chain. Each of them can be understood as a nearly independent vibrational region (see Fig. 6B), weakly coupled to its neighbors. This representation offers a totally new functional vision of the protein and also paves the way for a new understanding of allosteric processes Yan et al. (2018). This aspect will be addressed further in the Discussion section.

The subtle connection unveiled above between localization of vibrational energy and compressive reorganization of the active site is by no means an isolated case. This has emerged neatly from the systematic study of a set of 933 enzymes from the catalytic site atlas Porter et al. (2004), comprising a total of 10,566 experimentally annotated catalytic sites. For each enzyme, we have computed the LL and located its highest maxima (examples of 3D representations of LLs for several enzymes are displayed in Fig. 7, left column, while the right column displays the partitioning of each enzyme into independently vibrating domains, obtained from the LL using the procedure illustrated in Fig. 6).

Then, for each known catalytic site of the enzyme, we have computed the distance to the nearest maximum of the LL, expressed as a percent of the total length of the backbone chain (see Fig. 8A). Figure 8B displays a histogram of these relative distances, computed over all enzymes and all catalytic sites. The dotted curve plotted on top of the histogram represents the cumulative score. In 95% of the cases, a catalytic site is found within 0.2% of the total chain length from a localization hot spot. By comparison, the distance along the chain between a site picked at random and the nearest localized vibration site would be on average 10% of the chain length, i.e., about 200 times farther away! This striking concordance clearly indicates that vibrational energy localization, as dictated by the 3D scaffold, must play a key role in the design of enzyme function: in 95% of the case, catalytic sites are located in domains where residues exhibit fast compressive motions.

IV Discussion

Localization of vibrations is a general feature of the scaffold of proteins. The LL is a novel theoretical tool that allows one to capture quickly and efficiently the fundamental relationship between the 3D structure and the spatial pattern of localized vibrations, first by predicting their locations and second by showing how the complex and irregular shape of the macromolecule can be partitioned (segmented) into a few weakly coupled clusters of vibrations. These are identified by highly localized vibrations involving few specific residues with periods of the order $2-4\text{\,}\mathrm{p}\mathrm{s}$ , that systematically take the form of compressive motions. Channeling thermal (or non-equilibrium) vibrational energy along such specific localized eigenvectors could be crucial for optimal enzyme functioning, e.g. in reducing the transfer distance associated with transition-state barriers or modulating donor-acceptor distances along specific directions, thus accelerating the chemical reaction step. Our analysis through the LL, performed on 933 enzymes, has confirmed that the overwhelming majority of their catalytic sites are located at hot spots and are henceforth at the core of specific, fold-rooted compressive motions.

These considerations can be given additional physical meaning in the context of a phenomenological modified Marcus-like tunneling theory that is used with success to interpret experimental data on enzyme-catalyzed H-transfer reactions Meyer and Klinman (2005). According to such theoretical scheme, the overall tunneling rate can be written as

[TABLE]

In the above expression $\Delta G$ denotes the free energy barrier associated with the global transition between reactant and product in the multi-dimensional space of heavy nuclear coordinate and $\lambda$ the corresponding reorganization energy, both associated with slow conformational sampling needed to reach the tunneling-ready state (TSR). The effect of rate-promoting vibrations is to weigh H tunneling from the ground-state, here expressed in the WKB approximation through the ground-state action $S_{G}(R)$ which is a function of the donor-acceptor distance $R$ . The rate-promoting vibration(s) specifically couple to the DAD coordinate providing a slow modulation (compared to tunneling times) of the donor-acceptor potential energy represented by the equilibrium probability density $\mathcal{P}_{e}(R)$ corresponding to optimal compression through RPV motions along the DAD at the active site.

The characteristic times for thermally activated barrier crossing and/or tunneling in an enzymatic reaction are fast compared to the period of typical rate-promoting vibrations associated with the local reorganization of the active site (ps-ns), which are themselves swift compared to the time-scales of slow conformational sampling and conformational changes (ms-s). This hierarchy of time scales allows localized motions to slowly modulate (with respect to the actual transition step) the energy landscapes associated with chemical reactions. However, such modulations occur millions of times per second while the 3D conformation of the protein appears frozen, as the free energy landscape associated with the global reactant-product equilibrium is essentially static at the scale of the transition state lifetime. The striking and universal correspondence between the enzymatic active sites and the localization hot spots strongly suggests that such ps-ns local, time-modulated compressions are a basic feature of enzymes that is likely the product of evolutive optimization.

Another intriguing logical consequence of our analysis is that resonance mechanisms (i.e the fact that clusters may eventually communicate with common vibrations) between distant localization sites may promote energy transfer across the molecular structure without affecting the sites located in between. By spatially confining vibrations at very specific places, wave localization may allow in principle distant sites to be “fed” with energy. Long-range communication would occur through specific protein paths associated with each specific frequency, without involving the rest of the structure (i.e thus preventing resonant leakage of energy to other modes). Therefore, localized vibrations may have a key role in allosteric effects, as pointed out in Ref. Piazza, 2014.

In summary, investigating localized vibrations that control the active site reorganization in enzymes allows one to gain fundamental insight into the dynamical determinants of their functioning. The discovery of the related localization landscape sheds light onto the subtle link between the geography of fast compressive motions within an enzyme and its catalytic activity. Localized vibrations involving residues at or close to the active site correspond to motions that are typically compatible with the accepted timescales of rate-promoting vibrations ( $50-300\text{\,}\mathrm{c}\mathrm{m}^{-1}$ ) Schwartz and Schramm (2009); Klinman and Kohen (2013) and typically favor the shortening of transfer distances at molecular contact. Our analysis framework also offers an intriguing rationale for controlling fast dynamical effects at catalytic sites: any change in dynamical properties (interactions or mass) can be monitored with an extremely fast computational approach, allowing direct comparison with experiments, such as Kinetics Isotope Effects measurements Nagel and Klinman (2009).

Appendix A Elastic network model of protein dynamics

Elastic network models (ENM) of protein dynamics have been introduced by M. Tirion in 1996 Tirion (1996) and later reformulated in a coarse-grained version by Bahar and co-workers under the name of anisotropic network model (ANM) Bahar and Cui (2005). In the ANM, a given protein comprising $N$ residues is represented by an ensemble of $N$ fictitious particles, the mass of each particle being concentrated at the location of the corresponding $\alpha$ -carbons. By definition, the equilibrium configuration of the system is taken to coincide with the experimentally solved structure (i.e. from X-ray diffraction or as an average over several NMR conformers). All particles are taken to have the same mass, which we set equal to the average amino acid mass $M=110$ a.m.u., and each particle interacts with its neighboring particles through a central harmonic force. Let us denote $\mbox{\boldmath$ r $}_{i}(t)$ and $\mbox{\boldmath$ R $}_{i}$ the instantaneous and the equilibrium position vector of the $i$ -th residue, respectively. The total potential energy of the system is that of a network of beads and central springs, that is,

[TABLE]

where $K_{ij}$ is the force constant of the spring connecting the residues $i$ and $j$ , while $r_{ij}=|\mbox{\boldmath$ r $}_{i}-\mbox{\boldmath$ r $}_{j}|$ and $R_{ij}=|\mbox{\boldmath$ R $}_{i}-\mbox{\boldmath$ R $}_{j}|$ are the instantaneous and equilibrium Euclidean distances between the pair $(i,j)$ . The matrix of force constants can e specified in several ways. Here, in line with the original ideas of the ENM modeling strategy, we use a single stiffness $k$ for all springs and identify the set of interacting pairs through a connectivity matrix, that is,

[TABLE]

where $c_{ij}=\{1\ \text{for}\ R_{ij}\leq R_{c}\text{ and }0\ \text{otherwise}\}$ . According to previous studies Juanico et al. (2007), we set $k=5$ kcal/mol/Å2 and choose a cutoff $R_{c}=10$ Å. In order to compute the localization landscape of a protein, we consider the harmonic approximation of the ANM, which corresponds to

[TABLE]

where $u_{i\alpha}=r_{i\alpha}-R_{i\alpha}$ ( $\alpha=x,y,z$ ) are the Cartesian components of the displacement vector of residue $i$ . The Hessian matrix $\mathbb{H}$ is directly derived from the total potential energy through

[TABLE]

where $s^{\alpha}_{ij}=R^{\alpha}_{ij}/R_{ij}$ are the Cartesian components of the unit equilibrium inter-particle vectors. The normal modes (NM) of a system of interacting particles, such as the residues in an elastic network, are the eigenvectors of the mass-weighted Hessian matrix (also known as dynamical matrix),

[TABLE]

where $M$ is the diagonal mass matrix. It is well known that the high-frequency NMs of vibrations of protein structures are strongly localized in space, which is a result of the spatial quenched disorder of their equilibrium structures Bahar and Cui (2005). This is still true in our coarse-grained model where the highest frequencies are of the order of $100\text{\,}\mathrm{c}\mathrm{m}^{-1}$ and the corresponding displacement vector fields are localized in regions of the size of one coordination shell, i.e. $\mathcal{O}(R_{c})$ .

Appendix B The localization landscape of thermal phonons

B.1 Calculation of the localization landscape

Within the ANM framework, the equations of motion read

[TABLE]

By introducing the mass-weighted coordinates $X_{i\alpha}=\sqrt{M_{i}}u_{i\alpha}$ , this set of equations can be put into the following vector form:

[TABLE]

We look for solutions to Eq. (8) in the form ${\bf X}={\bf Y}e^{-j\omega t}$ , which amounts to solving the related eigenvalue problem, i.e. finding the eigenvectors ${\bf Y}^{n}$ and frequencies $\omega_{n}$ such that

[TABLE]

The displacement of residue $i$ can be decomposed into the contributions along each eigenvector ${\bf Y}^{n}$ , that is,

[TABLE]

Ref. Filoche and Mayboroda (2012) introduces a mathematical function called localization landscape (LL) for predicting low-frequency localization. Yet, in the case of an inhomogeneous discrete system, high-frequency eigenvectors also correspond to localized, short-wavelength vibrations. According to a procedure similar to the one developed in Lyra et al. (2015), a high-frequency LL can also be computed as the solution $\bf U$ to the following linear system

[TABLE]

where

[TABLE]

Here, $c$ is a small real positive constant such that all eigenvalues of the matrix $\widetilde{\mathbb{H}}_{c}$ are positive. The physical idea behind this (see Ref. Lyra et al., 2015) is to look for localized modes of wave vector close to $k=\pi/a$ where $a\simeq$ $3.83\text{\,}\AA$ is the equilibrium distance between consecutive $\alpha$ -carbons along the protein primary structure. This is the only 1D path belonging to the connectivity graph that ensures translational invariance along the chain. Finally, the localization landscape $\mathcal{U}$ used in this paper to rationalize the location of catalytic sites in enzymes is defined as the geometrical average of the three Cartesian components of ${\bf U}$ , namely

[TABLE]

Appendix C Computing Efficiency of the Method

An other important aspect of this approach is its remarkable computational efficiency. The study of proteins motions is usually conducted through an analysis of the normal modes. This requires solving the eigenvalue problem (see Eq. (9) in Appendix B)

[TABLE]

where ${\bf Y}$ and $\omega_{n}^{2}$ correspond to the normal modes and eigenfrequencies, respectively. Retrieving these quantities from normal modes analysis (NMA) can be a computational issue for large macromolecules (number of residues $N>10000$ ), especially when long range interactions are accounted for, as they considerably reduce the sparsity of the matrix $\widetilde{\mathbb{H}}$ . By contrast, the localization landscape is obtained by solving a simple linear system of algebraic equations

[TABLE]

where $\hat{L}$ stands for a self-adjoint operator constructed from the dynamical matrix (see Eq. (11) in Appendix B). Table 1 compares the computational cost of the two aforementioned approaches, by reporting the required CPU-time as a function of the number of degrees of freedom (d.o.f). The ratio between the CPU times required by the two methods is displayed in the last column.

The LL approach is roughly 50 times more efficient for the typical protein size encountered in this study, although we have have restricted this analysis to the case of tridiagonal matrices: in practice, the computational gap between the two methods is even more substantial in realistic systems. This performance offers a clear advantage for a systematic analysis of large sets of protein data.

Appendix D Calculation of the local compression factor

The compression factor $\mathcal{C}_{i}$ measures the average level of local compression at a given site. For a given pair $i,j$ , this amounts to evaluating the change in Euclidean distance along a given normal mode with respect to the equilibrium distance $R_{ij}$ . In mathematical terms, $C_{i}$ reads

[TABLE]

where $\mathcal{S}$ is the set comprising the $N_{\mathcal{S}}$ highest-frequency normal modes, $c_{i}=\sum_{j}c_{ij}$ is the connectivity of residue $i$ and $a$ is an arbitrary displacement in Å. In our calculation we chose $a=1$ Å, smaller than half the shortest inter-residue distance $R_{ij}\simeq 3.8$ Å. This ensures that $\mathcal{C}_{i}$ are positive quantities, in agreement with the physical requirement that relative displacements cannot exceed equilibrium inter-distances.

Acknowledgements.

S. M. is funded by a NSF INSPIRE grant and a Simons fellowship. S. M., C.W., and M. F. are funded by a grant from the Simons Foundation (563916, SM, 601954, CW, and 601944, MF).

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Nagel and Klinman (2009) Z. D. Nagel and J. P. Klinman, Nature Chemical Biology 5 , 543 (2009).
2Oldfield and Dunker (2014) C. J. Oldfield and A. K. Dunker, Annual Review of Biochemistry 83 , 553 (2014).
3Zinovjev and Tuñón (2017) K. Zinovjev and I. Tuñón, Proceedings of the National Academy of Sciences 114 , 12390 (2017).
4Kale et al. (2008) S. Kale, G. Ulas, J. Song, G. W. Brudvig, W. Furey, and F. Jordan, Proceedings of the National Academy of Sciences 105 , 1158 (2008).
5Agarwal (2005) P. K. Agarwal, Journal of the American Chemical Society 127 , 15248 (2005).
6Antoniou and Schwartz (2001) D. Antoniou and S. D. Schwartz, The Journal of Physical Chemistry B 105 , 5553 (2001).
7Hay and Scrutton (2012) S. Hay and N. S. Scrutton, Nature Chemistry 4 , 161 (2012).
8Luk et al. (2013) L. Y. P. Luk, J. Javier Ruiz-Pernía, W. M. Dawson, M. Roca, E. J. Loveridge, D. R. Glowacki, J. N. Harvey, A. J. Mulholland, I. Tuñón, V. Moliner, and R. K. Allemann, Proceedings of the National Academy of Sciences 110 , 16344 (2013).