Proteins: the physics of amorphous evolving matter
Jean-Pierre Eckmann, Jacques Rougemont, Tsvi Tlusty

TL;DR
This paper explores a unified physical and evolutionary framework for understanding proteins, modeling them as evolvable condensed matter and analyzing how genetic mutations influence protein structure and function.
Contribution
It introduces a mechanical model treating proteins as evolvable matter and uses Green's functions to connect genetic mutations with physical interactions.
Findings
Mutations cause localized perturbations in proteins.
Green's functions link genetic epistasis to amino acid interactions.
The framework aids in understanding protein evolution and design.
Abstract
Proteins are a matter of dual nature. As a physical object, a protein molecule is a folded chain of amino acids with multifarious biochemistry. But it is also an instantiation along an evolutionary trajectory determined by the function performed by the protein within a hierarchy of interwoven interaction networks of the cell, the organism and the population. A physical theory of proteins therefore needs to unify both aspects, the biophysical and the evolutionary. Specifically, it should provide a model of how the DNA gene is mapped into the functional phenotype of the protein. We review several physical approaches to the protein problem, focusing on a mechanical framework which treats proteins as evolvable condensed matter: Mutations introduce localized perturbations in the gene, which are translated to localized perturbations in the protein matter. A natural tool to examine how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Proteins: the physics of amorphous
evolving matter
Jean-Pierre Eckmann 1,2, Jacques Rougemont1, Tsvi Tlusty 3,4
1 Département de Physique Théorique,Université de Genève, CH-1211, Geneva 4, Switzerland
2 Section de Mathématiques, Université de Genève, CH-1211, Geneva 4, Switzerland
3 Center for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 44919, Korea
4 Department of Physics, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
Abstract
Proteins are a matter of dual nature. As a physical object, a protein molecule is a folded chain of amino acids with multifarious biochemistry. But it is also an instantiation along an evolutionary trajectory determined by the function performed by the protein within a hierarchy of interwoven interaction networks of the cell, the organism and the population. A physical theory of proteins therefore needs to unify both aspects, the biophysical and the evolutionary. Specifically, it should provide a model of how the DNA gene is mapped into the functional phenotype of the protein.
We review several physical approaches to the protein problem, focusing on a mechanical framework which treats proteins as evolvable condensed matter: Mutations introduce localized perturbations in the gene, which are translated to localized perturbations in the protein matter. A natural tool to examine how mutations shape the phenotype are Green’s functions. They map the evolutionary linkage among mutations in the gene (termed epistasis) to cooperative physical interactions among the amino acids in the protein. We discuss how the mechanistic view can be applied to examine basic questions of protein evolution and design.
Sections marked with ∗ contain more technical material and can be omitted at first reading.
Contents
I The protein problem: a theoretical physics perspective
The macromolecules that make living matter – lipids, hydrocarbons, nucleic acids, and in particular proteins – are among the most studied objects of Nature. Proteins comprise the central nano-machinery of the cell, whose numerous functions include the formation of structural elements, catalyzing metabolic reactions and conveying biochemical signals alberts1998cell; Fersht1999; Howard2001; Goodsell2009; Whitford2013a. For their significance in life, proteins and the genes that encode them have been extensively investigated using various experimental methods, such as crystallography, biochemical assays, mass spectrometry, fluorescence imaging, electron microscopy, directed evolution and deep sequencing Rambo2013; Cohen2001; Collins2011; Mandala2018; Barrera2011; Mehmood2015; Ha2012; Mardis2013; Chapman2011; Fernandez-Leiro2016. In parallel, sophisticated computational models, such as molecular dynamics, have been developed to predict the structure, function and folding of proteins Karplus2002; Karplus2005; Adcock2006; Dror2018; Scheraga2018; Isralewitz2001.
These experiments and simulations provide valuable data on protein structure, dynamics and genetics. However, there remain two inherent challenges: (i) Sparsity of data – the protein is the outcome of long evolutionary search in a high-dimensional space of gene sequences, which is impossible to sample, even by high-throughput experiments. (ii) Complexity of interactions – The function of a protein arises from collective many-body interactions in the heterogeneous amino acid matter, which are hard to probe and model.
In light of these challenges, we focus in this colloquium on a complementary theoretical approach that links the protein problem to the realm of condensed matter physics. Rather than using realistic simulations predicting the dynamics and function of concrete proteins, we shall discuss minimal models that allow, under several simplifying assumptions, to examine basic questions of protein evolution, especially how the collective physical interactions within the protein direct its evolution.111The main body of this colloquium is based on, and expands, ideas from papers Tlusty2007; Eckmann2008; Tlusty2008a; Tlusty2010; Tlusty2016; Tlusty2017; Dutta2018.
The structure of many proteins is known at a resolution of a few angstroms and there are detailed computational models of the forces between the amino acids. Here, however, the protein will be examined at a coarse grained level, in the spirit of lattice Lau1989; Shakhnovich1991 and network Chennubhotla2005 models. The protein will be described as a connected network whose nodes represent the amino acids. Furthermore, from this conceptual point of view, it suffices to assume that there are only two types of amino acids instead of the usual twenty.22221, when counting the rare pyrrolysine Hao2002; Srinivasan2002. (for example the classical HP model Lau1989 used in Sec. LABEL:sec:pnasmodel, in which amino acids can only be either hydrophobic (H) or polar (P)). In a real protein, forces between the amino acids are a complicated combination depending for example on their polarity, hydrophobicity, charge and shape. At the coarse grained level, it again suffices to consider instead just simple springs between pairs of neighboring sites. This is akin to using harmonic approximations in mechanics, which provide a generic understanding and a good physical insight.
With this kind of simplifications, one can translate certain questions of biology to analogous questions in the physics of amorphous networks. Among the rich set of methods in this classical subject of physics, some tools seem particularly well adapted to the protein problem. The approach is based on the dual nature of the protein; it is a physical object whose formation and physical interactions are also represented in the ‘dual’ gene, a sequence of symbols from a four-letter alphabet of the DNA bases, ‘A’, ‘C’, ‘G’, ‘T’. Evolution progresses by introducing mutations, that is, permanent modifications of this sequence. There are local mutations (nucleotide substitutions, short insertions and deletions) besides larger scale modifications (*e.g., *translocations, inversions, duplications). A natural approach to study protein evolution is to model the effect of mutations on the physical properties of the amino acids network.
Local mutations amount to short jumps between neighboring sequences in the genotype space, differing by one letter only, while large-scale mutations are equivalent to longer jumps. Both classes of mutations can be described in terms of alterations of the mechanical properties of the amino acid network. However, we shall focus on the class of local mutations. Practically, local mutations are easy to treat with classical techniques of condensed matter, for instance via Green’s functions, since they induce localized perturbations in the spring network. More importantly, it is possible to statistically sample the genotype space with continuous trajectories progressing by consecutive local mutations. This will be the main axis of this colloquium. Along the evolutionary trajectory, mutations come in three flavors: The ones leading to some sort of functional catastrophe or significant disadvantage, and therefore get eliminated by selection; others which improve the properties of a protein and finally, the large ‘neutral’ majority which do not induce any significant change in the function of the protein Kimura1983; Neher2011. In this manner, the ‘learning’ evolutionary process reduces the problem of improving a protein from an exhaustive combinatorial search approach into a biased random walk. This drastically reduces the dimension of the space which one needs to explore.
The condensed-matter approach to the protein problem may be viewed as an example of a potentially general framework that may be used to examine other strongly-coupled biological systems. For example, one may analyze metabolic and genetic networks in terms of localized perturbations and Green’s functions. Such analysis may suggest common underlying principles. It might as well turn out that biology is more contingent and depends on the history of the evolutionary process, but at least the few examples we describe give us hope that a rational approach, based on the laws of physics, may be useful in some cases.
Biological molecules are far from being the spring networks we use as a model. Still, similar abstractions proved successful in many areas of physics. For example, the dynamical systems of the so-called Axiom A class Eckmann1985 are systems with a very special, yet simple, structure. And although most systems do not belong to the Axiom A class, it proved very useful to consider that they behave ‘as if’.
There is a long history of studies in similar spirit of abstraction and simplification, starting with the conformal maps of D’Arcy Thompson Thompson1942, through the morphogenetic studies based on the theory of catastrophes by René Thom Thom2018. In the 21st century researchers have much more data available on biological systems. This allows to test hypotheses against measurements, infer from the data other questions to investigate, and suggest possible experiments to confirm or refute the theory. We close the introduction by two citations which reflect the general outlook of this colloquium:
Misha Gromov, in [Gromov2013, Abstract]
When you read a textbook on molecular/cellular biology you are enchanted by the logical beauty of biological structures. You want to share your excitement with your colleagues, but…you find out you are unable to do it: there is no language in the 21st century mathematics that can express this beauty. You feel there must be a new world of mathematical structures shadowing what we see in Life, a new language we do not know yet, something in the spirit the ‘language’ of calculus we use when describing physical systems.
Giovanni Jona-Lasinio, in Jonalasinio2012:
Theoretical physics was recognized as an independent field of research only at the end of the 19th century, shortly before the great conceptual revolutions of relativity and quantum mechanics. Today theoretical physics has multiple facets. I think that the time has come for a more precise characterization of the research field of theoretical biology, and for an assessment of its scope. [Translated from Italian]
We are convinced that such outlooks are important and our work should be viewed as an attempt in this general direction, in the hope that readers will be encouraged to proceed along this path.
II Biology as a challenge to theorists
Biological research has been extremely active in the past decades and experimental results have flourished to vastly improve our understanding of living matter. The challenge for theorists is to find subtopics which are at a stage where theoretical abstraction can be fruitful.
Here we focus on the relation between genes and the functions of proteins: genes (in DNA) code for amino acid chains that fold into the three-dimensional configurations of functional proteins. This sequence-to-function map is hard to decrypt since it links the collective physical interactions inside the protein to the corresponding evolutionary forces acting on the genome Koonin2002; Xia2004; Dill2012; Zeldovich2008; Liberles2012. Furthermore, evolution selects the tiny fraction of functional sequences in an enormous, high-dimensional space Povolotskaya2010; Keefe2001; Koehl2002, which implies that proteins form non-generic, information-rich matter, outside the scope of standard statistical methods. Therefore, although the structure and physical forces within a protein have been extensively studied, the fundamental question of how a functional protein originates from a linear DNA sequence still provides research challenges, in particular how functionality constrains the accessible DNA sequences.
To examine the geometry of the sequence-to-function map, we devise below a mechanical model of proteins as amorphous evolving matter.333In his book “What is Life?” schrodinger1944, Schrödinger uses the term ‘aperiodic crystal’ to describe material which contains genetic information. This is of course a very interesting forethought, but since the advent of quasiperiodic crystals, the term ‘amorphous’ leads to a more precise classification. Rather than simulating concrete proteins, we construct models which describe the hallmarks of the genotype-to-phenotype map (the translation of the gene to the protein). These models are sufficiently simple so that large-scale simulations can be performed, which allow to average over stochastic noise inherent to evolutionary dynamics. Furthermore, we restrict our approach to models in which the function of a protein arises from large-scale conformational changes, where big chunks of the protein move with respect to each other. These motions are central to certain functions Koshland1958; Henzler-Wildman2007; Savir2007; Schmeing2009; Savir2010a; Huse2002; Savir2013, For example, allosteric proteins are a type of ‘mechanical transducers’ that transmit regulatory signals between distant sites Perutz1970; Goodey2008; Lockless1999; Ferreon2013.
We end this section by mentioning a few papers which have dealt with similar issues, and which highlight the increasing interest in connecting biological questions with methods from solid state physics.
Common to these studies is a mechanical perspective on protein function. The motivation originates from many observations of proteins whose functions involve collective patterns of forces and coordinated displacements of their amino acids Daniel2003; Bustamante2004; Hammes-Schiffer2006; Boehr2006; Karplus2002; Henzler-Wildman2007; Huse2002; Eisenmesser2005; Goodey2008; Savir2010a. In particular, the mechanisms of allostery Monod1965; Perutz1970; Cui2008; Daily2008; Motlagh2014; Thirumalai2018; Koshland1966, induced fit Koshland1958, and conformational selection Grant2010 often involve global conformational changes by hinge-like rotations, twists or shear-like sliding of protein subdomains Gerstein1994; Mitchell2016; Mitchell2017.
A now-standard approach to examine the link between function and motion is to model proteins as elastic networks of amino acids connected by spring-like bonds. Early studies that apply this class of models are from the 1980s and 90s Levitt1985; Tirion1996, and in the last two decades the methods have been further developed and applied to many proteins Chennubhotla2005; Bahar2010; Lopez-Blanco2016. Decomposing the dynamics of the network into normal modes revealed that low-frequency ‘soft’ modes capture functionally relevant large-scale motion Tama2001; Bahar2010a; Haliloglu2015, especially in allosteric proteins Ming2005; Zheng2006; Hawkins2006; Arora2007; Tehver2009; Wrabl2011; Greener2015.
Recent work associates the soft modes of protein conformations with the emergence of weakly connected regions as described above, but also ‘cracks’ Miyashita2003, ‘shear bands’ or ‘channels’ Mitchell2016; Mitchell2017; Tlusty2016; Tlusty2017; Dutta2018; Rocks2019 that enable low-energy viscoelastic motion Qu2013; Joseph2014. Such contiguous domains evolve in models of allosteric proteins Hemery2015; Flechsig2017; Tlusty2017.
A source of inspiration for linking proteins to the physics of amorphous matter are the papers by the late Shlomo Alexander, especially Alexander1998; Alexander1982. In these works, Alexander highlighted the essential role of ‘floppy modes’ in the mechanical spectrum of amorphous solids. Also relevant are studies by Thorpe and Phillips on constraint theory and rigidity percolation in glasses, such as Thorpe1985; Phillips1985. Those works highlighted the ability to control the rigidity and accessible zero-energy modes of mechanical networks by balancing the number of degrees of freedom and the number of constraints, as was noted by Maxwell in 1864 Maxwell1864.
The link between the dynamical spectra of proteins and amorphous matter has been further explored in a recent series of works on mechanical metamaterials. The emergence of long-range allosteric response was used in Rocks2017 as a design principle for ‘programmable’ metamaterial made of amorphous spring networks Rocks2018. A similar random network approach was applied in Yan2017 to design elastic materials with tailored mechanical response. These works suggest that tunable amorphous materials have the flexibility required to produce elaborate designs, as recently demonstrated by mimicking the cyclical conformational motion of protein motors Flechsig2018. These promising approaches to metamaterial design are discussed elsewhere, for example in Ronellenfitsch2018; Kim2018; Rocklin2017; Baardink2018.
The present Colloquium focuses on a different aspect: understanding fundamental properties of the protein evolution – in particular the genotype-to-phenotype map – within the framework of condensed matter theory.
III Proteins as information machines
The building plan of a protein is determined by its corresponding gene, via the genetic code. The gene is a 1-dimensional string in an alphabet of 4 letters: the nucleotides ‘A’ (adenine), ‘C’ (cytosine), ‘G’ (guanine), and ‘T’ (thymine) (see Alberts2017, Ch. 6). The protein is a (folded) chain of amino acids (AA) which is translated from the gene according to the genetic code: each three successive letters (each non-overlapping triplet, called a codon) maps to a single AA. In principle, this would allow for possibilities, but in general there are only 20 different AA’s, making the code redundant, as we shall discuss in Sec. III.1.
We view the gene, *i.e., *the 1-dimensional string of letters as the tape of a Turing machine Turing1936; Herken1992; Condon2018. Since any alphabet can be recoded in binary (for example, each of the 4 nucleotides can be recoded as a 2-bits number), one can always think of it as a string of ‘0’s and ‘1’s. The proteins (and the transcription-translation machinery, which is itself made of proteins) would be the computer, which is able to read and interpret the string.
This particular machine is an example of a self-reproducing Turing machine Neumann1966, since the replication of the genome can be achieved by genome-encoded proteins. In addition, these machines are evolving when the genes are mutated. In other words, the machine can modify its own tape (see also Tlusty2016). A further study in this direction is Dyson1970, but there are many more, see e.g., Freitag2004.
III.1 Handling reading errors
Translation of the gene into its corresponding string of amino acids requires a specialized machinery, which includes the ribosome Alberts2017.444In addition to the ribosome, the machinery includes two sets of molecules, tRNAs, which carry the amino acids, and aminoacyl-tRNA synthetases, which charge the tRNAs with amino acids. The translation is preceded by a transcription step in which the DNA gene (a segment of the genome) is copied into a mRNA (a single molecule). The translation machinery ‘reads’ the code through chemical affinity, and might therefore mis-read the tape. Most amino acids are encoded by more than one codon, and this hard-coded redundancy of the genetic code helps to reduce the impact of such misreadings (see Tlusty2007; Tlusty2008; Tlusty2008a; Tlusty2008b; Tlusty2010 for a theoretical study and Eckmann2008 for an illustration).
As noted above this system allows for different codons (number of triplets from an alphabet of 4 nucleotides), but they generate only 21 different symbols.555Some terminology: The individual symbols (A, C, G, T) refer to nucleotides. The triplets of 3 nucleotides form the 64 codons. The 64 codons code for 20 AA and the stop symbol (which does not generate an AA). One of the AA is Methionine (codon ATG) which marks the start of a protein. The geometric aspects of this arrangement of 21 among 64 possibilities can be understood in graph-theoretical terms: One presents the 64 codons as the nodes of a codon-graph, and two nodes are connected by a link if the corresponding codons differ in only one symbol. Note that swapping ‘C’ and ‘T’ in the codon’s third position always results in the same AA (Fig. 1) and we can therefore reduce the graph to nodes.666This graph is difficult to draw, as each node has neighbors which differ in exactly one position. So a representation would have to be in 8 dimensions. Recall that in a cube in 3-dimensions, every corner has 3 neighbors. In the codon-graph, each amino acid is coded as a simply connected region, as shown in Fig. 1, with the exception of Serine (ser) (Arginin (arg) is disconnected in the 2D table, but not in the graph). Such an arrangement minimizes the ratio of surface by area for each region. This reduces the probability of coding the wrong AA, under the assumption that most reading errors involve only one-letter differences.
Additionally, amino acids with similar chemical properties (for example polarity) tend to be neighbors in this graph. This can be visualized by plotting the measured polarity as a function of the codon, which produces a relatively smooth landscape. The smoothness manifests the chemical similarity between neighboring amino acids, and implies that most misreadings change the polarity of AA only moderately. We note that, unlike the 2D landscape of Fig. 1, an ideal representation should wrap the surface so that each AA would have 8 neighbors (and can therefore be embedded only in high dimension).
For the connection between the numbers 21 and 48, an inequality can be given in terms of the genus of the codon-graph Tlusty2007; Tlusty2007a; Tlusty2010 (this uses results from Colin1993; Banchoff1965). Without going into further detail we conclude: The optimal code must balance contradicting needs for tolerance to errors (with the smoothness of the mapping between codons and chemical space) and chemical diversity, which is essential for the versatility of protein function.
III.2 Folding
Having translated the gene into a linear chain of amino acids (the backbone, see Fig. 2) via the genetic code (and modulo translation errors), this chain will spontaneously fold into a 3-dimensional shape which gives rise to its function. How this folding proceeds is an important and difficult question, which we shall not address here. Instead we will assume that a certain folding pattern is preserved (see Petsko2004 for a discussion of these issues). This assumption is practical, as we shall be mostly interested in how the function of the protein changes under point mutations of the gene, *i.e., *bit flips of the code in the tape. Such mutations often do not seriously affect the overall shape of the protein (see also Bussemaker1997).
We can next model the function of this folded amino acid chain and we will show that there is yet another level of redundancy besides the redundancy of the genetic code and the robustness of the folding. we shall see that there are many mutations which have no effect on performance. Namely, there is high redundancy in the AA sequences that are mapped to the same or similar enough protein function. we shall quantify this property in terms of dimension Grassberger1983; Eckmann1992.
IV Mechanical views on protein evolution
Consider a protein interacting with a small molecule. Presence of the latter often induces a conformational change at some distance from the interaction site. One important example is the class of allosteric proteins for which an active site is regulated by binding at another site, resulting in a reconfiguration of the active site. More specifically, we shall examine the role of large-scale, functionally-relevant dynamical modes, and their link to long-range genetic correlations.
Before reviewing the literature on this issue, we illustrate such a mechanical effect on a particular example: human glucokinase (which is involved in sugar metabolism), see Fig. 3. The data were obtained from crystallographic structure of two conformations of that protein: the first (PDB777PDB = protein data bank, https://www.rcsb.org. accession 1v4s) corresponds to the binding of glucose to its active site and is compared to the conformation in the absence of glucose (PDB 1v4t) [Kamata2004].
The backbone, see Sec. LABEL:sec:backbone, is shown as a light blue curled tube, and the arrows indicate the displacement from one shape to the other (any Galilean motion between the two is eliminated). The color of the arrows indicates up/down motion relative to a horizontal plane. The red coloring in the twisted tube shows the high shear region separating two low-shear domains that move as rigid bodies (shear calculated by the method of Mitchell2016; Rougemont2099).
On a conceptual level, one can simplify the figure as shown in Fig. 4. The protein seems to have a central shear band and two external flaps which perform a rotating motion when a ligand attaches to the protein. This kind of mechanical phenomenology is accessible to the language of physics.
Large-scale motions take part in several basic biological functions and mechanisms. For example, in the induced fit Koshland1958 and conformational selection Bahar2007; Grant2010 mechanisms, the presence of a substrate induces reshaping of the enzyme to properly align the catalytic groups in the active site. Such reshaping is a dynamic mechanism of specific recognition that allows the selection of a target ligand among similar competing molecules Savir2007; Savir2013. In allostery, reconfiguration of the active site is regulated by binding at a secondary, allosteric site, often via long-range mechanical interactions Motlagh2014; Thirumalai2018. In this Colloquium, we describe simple physical models for the emergence of these mechanisms via evolutionary tuning of the protein’s mechanical response.
Like their dynamic phenotypes, proteins’ genotypes (their gene sequences), as explained in Sec. III, are remarkably collective. The history of protein evolution can be traced by gathering evolutionary related proteins in different species (homologous proteins) and aligning their sequences. Genes of these proteins sometimes display long-range correlations Goebel1994; Marks2011; Jones2012; Lockless1999; Suel2003; Hopf2017; Poelwijk2017; Halabi2009; Tesileanu2015; Juan2013. The correlations indicate epistasis, the compensatory mutations that take place among residues linked by physical forces or common function. As an example Rougemont2099, consider again glucokinase. We aligned about 120 variants of this molecule and asked where along the gene have mutations preferentially occurred (Fig. 5).
Still, the relationship between sequence correlation, epistasis and selection pressure are not fully understood. As discussed in Sec. I, the two main challenges are the intricacy of the physical forces among the amino acids, and the high dimensionality of the of the genotype-to-phenotype map Koonin2002; Povolotskaya2010; Liberles2012. These inherent difficulties motivated the development of complementary approaches which utilized simplified coarse-grained models, such as lattice proteins Lau1989; Shakhnovich1991 or elastic networks Chennubhotla2005. Network and lattice models have been recently used to study the evolution of allostery in proteins and in biologically-inspired allosteric matter Tlusty2016; Tlusty2017; Hemery2015; Flechsig2017; Rocks2017; Yan2017. Our aim here is different: to construct a simplified condensed-matter model in terms of how the mechanical interactions within the protein shape its evolution.
V Condensed-matter theory of proteins
This section will review a theory of proteins in terms of evolvable condensed matter. we shall discuss the conceptual roots of this approach in the physics of amorphous matter (mainly glasses) and spectral theory. We will introduce the basic setting of modeling proteins as evolving amino acid networks. The emergence of function is associated with the evolution of a weakly connected region, which enables a low-energy ‘floppy’ mode to appear. This minimal network approach allows one to examine basic questions of protein evolution.
we shall discuss two different models in this review. One will be called the ‘cylinder-model’ and the other the ‘HP-model’. The first model is simpler, but the second comes somewhat closer the biological reality. Before distinguishing the two models, we describe their common features.
V.1 Lattice models
Our protein is modeled by a finite (regular) lattice in 2 (or 3) dimensions. We assume that the lattice forms a cylinder (periodic boundary conditions) or an open rectangle (open boundary conditions) of width and height (see the examples in Sec. LABEL:sec:cylinder–LABEL:sec:pnasmodel).
It is important to note that and are finite while otherwise quite arbitrary. This is so because the protein should not be viewed as a problem of thermodynamic limits, but rather in the context of small amorphous objects. This being said, other aspects of the geometry seem less important. The points may also be chosen as lying on small perturbations from the regular lattice to avoid effects of lattice symmetry on the spectrum. The number of AAs should typically be in the range 200–2000, corresponding to the typical size of the protein.
Amino acids interact via electrostatic forces, van der Waals forces, hydrogen bonds, disulfide bonds and hydrophobicity Petsko2004; Fersht1999. All these are short range interactions, which amount to local coupling between lattice points. we shall therefore assume that each AA interacts with its nearest and next nearest neighbors. For example on a hexagonal lattice, with nearest and next-nearest neighbors linked, the number of connections (the node’s degree) is at most 12; all nodes in the protein interior have 12 links (*i.e., *bonds) while those at the boundary have fewer (but at least 3), see Fig. LABEL:fig:1.
Finally, the coupling itself is modeled by harmonic springs carried by each graph link Born1954; Alexander1998; Chennubhotla2005; Tirion1996. Its strength is determined only by the types of AAs at each end of the link.
V.2 The lattice Laplacian
The lattice and its links may be viewed as an abstract graph. This means that one can define gradients and Laplacians Biggs1993; Chung1997.888The first book is more combinatorial, and the second introduces more spectral concepts. In the graph, there are amino acid nodes, indexed by Roman letters, and bonds, indexed by Greek letters.999If a bond connects nodes and (, we also write .
First, one endows every bond in the graph with an arbitrary but fixed orientation, and then the incidence matrix of a graph is the matrix defined by
[TABLE]
Remark that for any function on the vertices, the map is the co-boundary mapping of the graph, namely
[TABLE]
where is the link connecting to .
As in the continuum case, the Laplace operator is the product , where denotes the adjoint. The non-diagonal elements are if and are connected and [math] otherwise. The diagonal part of is the degree , *i.e., *the number of nodes connected to . Note that this is a discrete graph Laplacian, and no coordinates are involved so far.
We next embed the graph in a Euclidean space (), by assigning positions to each AA, *i.e., *to each lattice point . This is coded as a real matrix . Finally, to each bond we assign a spring with constant which we view as the diagonal elements of an matrix :
[TABLE]
This defines a deformable spring network which has an internal energy, an equilibrium configuration.
To account for the energy cost of deformations in the lattice protein, one considers the elastic tensor (or Hamiltonian) which we now describe in detail [Chung1992, pp. 618–619]. The quantity is a tensor because the deformations are not scalars, but vectors in . We first denote by the (normalized) direction vectors for each bond : . Then, we define the ‘embedded’ gradient tensor (of size ) which is obtained by multiplying each element of the graph gradient by the corresponding vector :
[TABLE]
