The Spatial Distribution of Local Mobility in Folded Proteins
S. Rackovsky

TL;DR
This paper introduces a method to study how amino acid mobility is distributed in proteins, revealing that mobility patterns may influence protein dynamics and evolution.
Contribution
A new method to analyze spatial mobility distribution in proteins using bioinformatics and signal processing tools.
Findings
Mobility is nonuniformly distributed in a significant subset of protein structures.
Nonuniform mobility distribution is not linked to specific protein fold classes.
Differences in local mobility correlate with differences in total mobility.
Abstract
A quantitative approach is developed to the study of the spatial distribution of amino acid mobility in protein structures. This method, which is based on bioinformatic and signal processing tools, makes it possible to study very large databases of structures simultaneously, and to search for the existence of domains within proteins which are defined by mobility effects, rather than by static structural considerations. It is shown that mobility is distributed nonuniformly in a substantial subset of structures in a large database; that nonuniform mobility distribution does not select for fold class; and that differences in local mobility distribution are correlated with differences in total mobility. Analyzed in light of previous results, these findings suggest that the dynamics of proteins with nonuniform distributions of mobility may exhibit dynamics dominated by local modes, rather…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1|
| |
|---|---|
| ALA | 26.27 |
| ASP | 29.89 |
| CYS | 26.27 |
| GLU | 31.92 |
| PHE | 25.31 |
| GLY | 27.45 |
| HIS | 26.98 |
| ILE | 25.86 |
| LYS | 31.30 |
| LEU | 26.88 |
| MET | 27.65 |
| ASN | 28.52 |
| PRO | 28.77 |
| GLN | 29.54 |
| ARG | 29.47 |
| SER | 28.32 |
| THR | 27.06 |
| VAL | 25.67 |
| TRP | 23.98 |
| TYR | 25.07 |
| C | all | asymmetic | symmetric |
|---|---|---|---|
| 1 | 5445 | 1315 | 4130 |
| 2 | 4973 | 1210 | 3763 |
| 3 | 11860 | 2896 | 8964 |
|
| –0.009 |
|
| –0.036 |
|
| 0.011 |
|
| –0.011 |
|
| 0.003 |
|
| –0.06 |
|
| 0.001 |
|
| 0.01 |
| C | β0 (symmetric) | β0 (asymmetric) |
|
|
|---|---|---|---|---|
| 1 | 27.995 | 27.925 | 6.42 | ≪0.001 |
| 2 | 27.815 | 27.806 | 1.32 | 0.19 |
| 3 | 27.833 | 27.843 | –2.44 | 0.02 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Origins and Evolution of Life
Introduction
It has long been understood that the mechanisms of folding and function of a protein are critically linked to the dynamic characteristics of the molecule. For this reason, it is of central importance to gain an understanding of the global dynamic organization of proteins. This requires a comprehensive bioinformatic analysis, akin to the many studies which underlie our understanding of the sequences and structures of folded proteins. Until now, however, tools have not been available to carry out such an analysis. Computational studies of protein dynamics have been limited to simulation by molecular dynamics (MD) or by elastic network (EN) models. These approaches, which provide detailed information on atomic length scales for individual molecules, are not suited to the simultaneous analysis of all the proteins in a very large database.
In recent work, ?−? ? ? ? ? ? we have developed a bioinformatic view of protein dynamics, based on an average, residue-specific α carbon B factor. We denote the average B factor for amino acid X as ⟨B(X)⟩. (Values of this parameter are given in Table.) Any sequence can then be written as a numerical string of ⟨B(X)⟩ values, in a form which we refer to as the dynamic sequence of the protein. Techniques of signal processing- particularly Fourier analysis- are then applied to the set of dynamic sequences of interest. This representation makes it possible to characterize the mobility distributions of very large groups of structures simultaneously, with extremely modest computational (hardware and runtime) requirements. We have applied this methodology to a unified study of structure and mobility relationships in a large set of folded proteins, and have demonstrated the power of this representation in studying dynamic properties of proteins which are not accessible by any other approach.
1: Residue-Specific Average B Factors
In the course of that work, we demonstrated? that the connection between three-dimensional structure and mobility is encoded in the total mobility of the sequence, which we denote by β_0_
It was shown that β_0_ is correlated with several large-scale properties of proteins:
- β_0_ was shown to act as a controlling variable driving a transition between dynamic regimes in folded proteins, a transition which occurs in differing ways in different folding classes;?
- A plausible connection between β_0_ and fold switching was demonstrated;?
- The basic determination as to whether a given sequence folds or remains disordered was shown to be controlled by β_0_ and other lowest-k Fourier coefficients of the dynamic sequence;?
- It was shown that there are significant differences in the distributions of β_0_ between fold classes, and between folded and intrinsically disordered proteins.?
In the present work, we begin to study spatial, as distinct from sequential, aspects of mobility. It is clear that spatially localized mobility must be connected to local flexibility, and therefore to the mechanisms of folding and function of folded proteins. The results embodied herein represent a first step in studying that connection. We have, in fact, demonstrated in previous work? that the choice between folding classes is encoded in intermediate-k Fourier coefficients of the dynamic sequence- i.e., by the distribution of mobility along a sequence at intermediate length scales.
We will be concerned herein with several questions concerning the 3-dimensional spatial distribution of mobility, the answers to which are not accessible by conventional approaches:
- To what extent is the mobility of individual residues distributed nonuniformly in the structures of folded proteins?
- If nonuniform distributions of mobility are observed, do they select for particular fold classes?
- Is the three-dimensional distribution of individual amino acid mobility correlated with the total mobility of the protein?
These results will enable us to address a point of fundamental interest, hitherto uninvestigated- the possibility that proteins exhibit domain-like segregation defined by the occurrence of regions with high residue mobility- a property which we refer to as dynamic asymmetry- rather than by purely static structural characteristics. The ability to detect regions of enhanced flexibility- i.e., high localized mobility- in proteins would be an important tool, and has received recent attention? from a simulation point of view. It has been hitherto impossible to search for such domains on a large scale, due to the unsuitability of standard methods for the simultaneous investigation of large data sets of protein structures.
These questions go to the heart of protein behavior. It is reasonable to suppose, for example, that pairs of proteins with similar structures and residue mobility distributions share folding and functional characteristics. It is also reasonable to suppose that pairs of proteins with similar structures but differing distributions of residue mobility fold differently, function differently, or both. With what frequency do such pairs of proteins occur? We have demonstrated, in previous work,? the existence of pairs of proteins which have similar structures and different dynamic behavior. That work was carried out on a small set of preselected proteins. There has been, heretofore, no method available for studying the mobility differences of proteins en masse. More particularly, there has been no method available for studying inhomogeneity of mobility within single domains, which are traditionally defined by static geometric properties.
The methods we develop herein lay the groundwork for investigating this and other questions. They represent the first step in an ongoing investigation, the object of which is to connect the rigorously defined spatial distribution of residue mobility to the folding behavior and function of proteins. As a necessary preliminary, a quantitative approach to describing the spatial distribution of amino acid mobility is developed. This approach has the following features:
- 1.It is based on 3-dimensional, rather than 1-dimensional considerations.
- 2.We introduce, as a coarse-grained measure of dynamic inhomogeneity, a dynamic dipole. The definition of this directional parameter also makes it possible to consistently define a fixed, dynamics-based coordinate system within any structure.
- 3.The fine-grained distribution of dynamic characteristics within a given structure is then described, based on that coordinate system, using a spherical harmonic expansion of single-residue < B(X)> values.
- 4.The statistical significance of results can be established analytically.
- 5.Extremely large groups of structures can be analyzed simultaneously.
- 6.The hardware and runtime requirements of this approach are orders of magnitude less than traditional methods for studying dynamic properties.
The following results are then demonstrated:
- Local mobility is distributed in a statistically significant nonuniform manner in roughly one-quarter of the proteins in a large data set.
- Nonuniform distribution of mobility is not observed to select for fold class.
- Differences in local mobility distribution are, however, correlated with differences in total mobility. The magnitude and sign of this correlation vary by fold class, and these results, together with previous observations, suggest that proteins with regions of dynamic asymmetry tend to be characterized by dynamic modes which are local in character, rather than by those arising from large scale architecture. This observation is in consonance with physical intuition.
Methods
We require a quantitative measure of the homogeneity, or lack thereof, of the distribution of mobility within a specified structure. We begin by define a characteristic which unites the structural and dynamic properties of a protein in a single quantitative parameter- the mobility dipole. This is calculated as follows:
- 1.The α carbon center of mass of the structure, R _ 0 _, is calculated in straightforward fashion.
- 2.The mobility-weighted C_α_ center of mass, R _ β _, is calculated.
- 3.From these, the mobility dipole vector D = (R _ β _-R _ 0 _) is calculated.
The mobility dipole vector will be very small for proteins which exhibit a uniform distribution of the residue-dependent mobility ⟨B(X)⟩ throughout the structure. As the distribution of ⟨B(X)⟩ becomes less homogeneous, the vector will increase.
The C_α_ center of mass is given by
and the mobility weighted C_α_ center of mass by
where R * i
- is the position, and β* i
- (= ⟨B(X* i )⟩) is the mobility, of the ith* α carbon.
The significance of the vector D lies in the fact that it defines a direction within the structure determined by the distribution of mobilities. A complete Euclidean coordinate system based on D can be constructed from this vector as follows. We first define the unit vector in the z direction as
In order to establish a direction for the unit vector in the x direction, we identify the residue within the structure whose α carbon is farthest from R _ 0 . The position vector of that C_α, in the coordinate system centered on R _ 0 _, is denoted R _ max _. We then define the unit vector in the x direction by the equation
and the unit vector in the y direction by
This coordinate system provides a basis for defining the spatial distribution of mobility within the structure.
We characterize that distribution by performing a spherical harmonic transform, which can be thought of as the 3-dimensional generalization of a Fourier transform. Spherical harmonics are special functions which carry two indices, l and m, which together describe the angular distribution of the quantity of interest. They define a complete, orthonormal basis over the sphere. Therefore, any real-valued function f(ω) of spherical variables may be expanded as a linear combination as follows
where Φ(l,m) is the spherical harmonic with indices l and m, and the expansion coefficients Γ(l,m) can be computed from a projection integral, as
In the case of discrete variables, which will be of interest herein, this becomes
a form which is particularly convenient because the spherical harmonics Φ(l,m) can be expressed in Cartesian form. The first nine harmonics, which will concern us in the present work, are
The discrete variable of interest in the present case is the mobility of the residue at position r, which we denote by β(r). Then the spherical harmonic coefficients become
where we include the spatial variable r in the argument of the spherical harmonic to indicate that we use the Cartesian representation.
It is necessary to know the statistical significance of the resulting coefficients. The relevant question is whether these coefficients differ from those which would be obtained from a similar calculation on a randomly permuted version of the original sequence, with the same structure. To this end, we compare the computed spherical harmonic coefficients of the wild-type sequence to the ensemble of coefficients arising from all possible randomly permuted sequences. In previous work? it was shown that the permuted-sequence ensemble averages of coefficients arising from orthonormal expansions of numerically represented sequences, and the associated standard deviations, can be calculated analytically, obviating the need for computationally intensive, and necessarily approximate, estimates resulting from simulations. We adopt this approach in the present work. This makes it possible to represent the coefficients as Z functions in the following form
where the brackets in the numerator represent the average over all possible sequence permutations.
When the coefficients are represented in this form, their statistical significance is automatically reflected in their numerical values, in standard fashion.
For the present calculations, we use a database of 22278 proteins with known sequence and high-quality structures, with pairwise sequence identity no greater than 60%, derived from the CATH? database. It should be remarked, with regard to computational requirements, that a complete analysis of this database requires approximately 5 min on a MacBook Pro.
Results and Discussion
We begin by addressing the distribution of mobility in protein structures. It should be noted that, in this work, we consider the properties of single domains. We wish to know the prevalence of statistically significant dynamic asymmetry. Using the measures outlined in the Methods section, and a standard statistical criterion, we define structures with significant dynamic asymmetry as those with at least one spherical harmonic coefficient for which Z(Γ)≥1.96. We find that 5421 domains- 24.3% of the data set- exhibit significant dynamic asymmetry. In Table, we present the populations of the full data set, and of two subsets of the data set, comprising dynamically symmetric and asymmetric domains, for three values of the CATH label C. [It should be remembered that C = 1 labels all-helical proteins, C = 2 labels sheet/barrel proteins, and C = 3 labels mixed α/β molecules.] We can ask whether the relative populations of the three folding classes differ between the subset of proteins which exhibit significant asymmetry and the complementary, symmetric subset. We address this question by applying a χ^2^ test to the two right-hand columns of the table. We find that there is no significant difference in the relative populations of folding classes between the displaced and undisplaced subsets. There is no selection for asymmetry by fold class.
2: Populations of the Database and Subsets by Folding Class
We next ask whether the members of the asymmetric subset differ significantly from dynamically symmetric proteins with respect to total mobility. We find that the average value of β_0_ for the set of proteins with dynamic asymmetry is 27.855, while that for the set of dynamically symmetric proteins is 27.869. This difference, although small, is statistically significant, with p = 0.00014. Another way of investigating this question is to determine the degree to which the spherical harmonic coefficients are correlated with β_0_. In Table, we show the relevant correlation coefficients. These are all small, but the correlation coefficients for Z(1,0) and Z(2,0) are noticeably larger in magnitude than others, and negative in sign. It should be noted that, as shown in Figure, Z(1,0) and Z(2,0) are significantly larger in magnitude and/or range than the other coefficients. It will be seen from eqs and ? that these two coefficients depend only on i _ z _, the direction of the dynamic dipole, and therefore reflect most directly the degree to which asymmetry is present. The negative sign of the correlations corresponds to our observation that the average mobility of the dynamically symmetric subset is larger than that of the subset of asymmetric proteins.
Side-by-side Boxplot of the Distributions of Spherical Harmonic Z-functions.
3: Correlation of Spherical Harmonic Coefficients With β0
How does this difference in mobility arise? Is it correlated with fold class? In Table we compare the average values of β_0_ of dynamically symmetric and asymmetric proteins, as a function of the CATH index C. It will be seen that there is a highly significant difference in average β_0_ values between the two groups of helical proteins, but that the difference in average β_0_ values between the two groups of sheet/barrel proteins is not statistically meaningful. The mixed proteins show a smaller, but statistically significant, difference, in the opposite direction to the helical proteins. We have demonstrated in previous work? that, in helical proteins, as β_0_ increases, dynamics are more likely controlled by global structure, rather than local modes, whereas lower β_0_ values are likely associated with dynamics controlled by local modes. It was likewise shown that, in sheet/barrel and α/β proteins, the opposite tendency is observed. These observations suggest that the proteins in the asymmetric group, as a whole, are more likely to exhibit locally controlled dynamics.
4: Comparison of Mobilities by Fold Class
Conclusions
We have demonstrated the following results
- Nonuniform distribution of residue mobility is observed in a large subset of proteins.
- Nonuniform mobility is not correlated with fold class.
- It is, however, correlated with differences in total mobility. The characteristics of this correlation suggest that proteins with regions of dynamic asymmetry are likely to be characterized by dynamic modes which are local in character, rather than by those arising from large scale architecture.
These results indicate that there is indeed high-mobility substructure in a substantial subset of single domains. The connection between that substructure and folding, and various protein functions, such as substrate binding and catalysis, is the subject of ongoing investigation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Scheraga H. A.Rackovsky S.Sequence-specific dynamic information in proteins Proteins: Struct., Funct., Bioinf.20198779980410.1002/prot.25747 PMC 671832631134683 · doi ↗ · pubmed ↗
- 2Rackovsky S.Scheraga H. A.The structure of protein dynamic space Proc. Natl. Acad. Sci. U.S.A.2020117199381994210.1073/pnas.200887311732759212 PMC 7443874 · doi ↗ · pubmed ↗
- 3Scheraga H. A.Rackovsky S.Dynamic and conformational switching in proteins Biopolymers 2021112 e 2341110.1002/bip.2341133270217 PMC 8172660 · doi ↗ · pubmed ↗
- 4Konkankit C.Rackovsky S.The dynamic basis of structural order in proteins Proteins: Struct., Funct., Bioinf.2022901115111810.1002/prot.26296 PMC 900781734981860 · doi ↗ · pubmed ↗
- 5Rackovsky S.Structure class encoding in protein dynamic bioinformatics J. Phys. Chem. B 20221265730573410.1021/acs.jpcb.2c 0250235900129 · doi ↗ · pubmed ↗
- 6Konkankit C. C.Rackovsky S.Global survey of protein dynamic properties J. Phys. Chem. B 20231276073607710.1021/acs.jpcb.3c 0260937368985 · doi ↗ · pubmed ↗
- 7Kombo D. C.La Marche M. J.Konkankit C. C.Rackovsky S.Applications of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences Proteins: Struct., Funct., Bioinf.2024921234124110.1002/prot.26704 PMC 1151164938808365 · doi ↗ · pubmed ↗
- 8DziadekŁJ.Sieradzan A. K.Czaplewski C.Zalewski M.Banas F.Toczek M.Nisterenko W.Grudinin S.Liwo A.Gieldon A.Assessment of four theoretical approaches to predict protein flexibility in the crystal phase and in solution J. Chem. Theory Comput.20247667768110.1021/acs.jctc.4c 0075439171852 PMC 11391579 · doi ↗ · pubmed ↗
