Randomly cross-linked polymer models
Ofir Shukron, David Holcman

TL;DR
This paper introduces a randomly cross-linked polymer model for chromatin, providing analytical formulas for key properties and validating them with simulations, enabling inference of cross-linking from experimental data.
Contribution
The paper presents a novel RCL polymer model with analytical formulas for chromatin properties, validated by simulations, and applicable to experimental data analysis.
Findings
Derived asymptotic formulas for polymer properties.
Validated formulas with Brownian simulations.
Enabled estimation of cross-links from data.
Abstract
Polymer models are used to describe chromatin, which can be folded at different spatial scales by binding molecules. By folding, chromatin generates loops of various sizes. We present here a randomly cross-linked (RCL) polymer model, where monomer pairs are connected randomly. We obtain asymptotic formulas for the steady-state variance, encounter probability, the radius of gyration, instantaneous displacement and the mean first encounter time between any two monomers. The analytical results are confirmed by Brownian simulations. Finally, the present results can be used to extract the minimum number of cross-links in a chromatin region from {conformation capture} data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · DNA and Nucleic Acid Chemistry · Protein Structure and Dynamics
Randomly cross-linked polymer models
O. Shukron and D. Holcman
Institute of Biology, Ecole Normale Supérieure, 46 rue d’Ulm 75005 Paris, France.
Abstract
Polymer models are used to describe chromatin, which can be folded at different spatial scales by binding molecules. By folding, chromatin generates loops of various sizes. We present here a randomly cross-linked (RCL) polymer model, where monomer pairs are connected randomly. We obtain asymptotic formulas for the steady-state variance, encounter probability, the radius of gyration, instantaneous displacement and the mean first encounter time between any two monomers. The analytical results are confirmed by Brownian simulations. Finally, the present results can be used to extract the minimum number of cross-links in a chromatin region from conformation capture data.
DNA in the nucleus is constantly remodeled by regulatory factors and compacted genomic regions form transient and stable loops Nora et al. (2012); Tark-Dame et al. (2014). Looping is thus a key event in chromatin regulation: it is rare for a single polymer but frequent in a population of hierarchy folded genome. Genome organization is now probed by chromatin Conformation Capture (CC) techniques Dekker et al. (2002); Simonis et al. (2006); Lieberman-Aiden et al. (2009), which simultaneously give access to looping events in an ensemble of millions of chromatin segments. This experimental approach provides contact frequency matrices at various scale from few kilo- to Mega-base-pairs. Analysis of these matrices remains difficult, but revealed that mammalian genomes contain ”blocks” of up to few Mbp in size, called Topologically Associating Domains (TADs) Nora et al. (2012); Dixon et al. (2012). The role of TADs and organization remains unclear, although they are involved in gene regulation Nora et al. (2012); Simonis et al. (2006) and replication. TADs appear by averaging encounters over an ensemble of millions of samples Lieberman-Aiden et al. (2009) and represents steady-state looping frequencies, but does not contain neither directly information about the size of the folded genomic section nor any transient genomic encounter times.
To reconstruct chromatin at a given scale and explore its transient properties, polymer models are used as a coarse-grained representation. The Rouse model Doi and Edwards (1986), characterized by nearest neighbors interactions, predicts an encounter probability (EP) that decays with between monomer and , but cannot account for long-range interactions, observed inside TADs of the CC data Shukron and Holcman (2017); Nora et al. (2012). Other polymer models include attractive and repulsive forces between monomers Sokolov (2003); Bohn et al. (2007); Bohn and Heermann (2010); Heermann (2011); Amitai and Holcman (2013); Langowski and Heermann (2007) to account for long-range interactions and have been used to probe the heterogeneous steady-state organization of the chromatin, characterized by loops Jost et al. (2014).
We study here a randomly cross-linked (RCL) polymer model used in Shukron and Holcman (2017) to describe the ensemble of chromatin conformation, where random cross-links are formed by binding molecules Nora et al. (2012). The space of RCL polymer configurations was so far explored only numerically Heermann (2011); Jespersen et al. (2000), but no analytical formulas have been derived for studying the steady-state or transient properties and thus explore the large parameter space. We derive here novel analytical formulas for the encounter probability (EP), variance, and the radius of gyration of the RCL polymer, that we use to study the polymer dynamics. The present model can be used to determine from CC empirical EP , the minimal number of cross-links, inaccessible from CC experiments. We further study the mean first encounter time between any two monomers, which plays a key role in gene regulation Kadauke and Blobel (2009). Most of the asymptotic derivations are confirmed by Brownian simulations.
- *The RCL polymer model. We start with a linear polymer in dimension consisting of monomers with positions , connected sequentially by harmonic springs Doi and Edwards (1986) and we added connectors between random non-nearest neighboring (NN) monomer pairs (Fig. 1A). The potential energy of the RCL polymer is the sum of the spring potential of linear backbone and that of random connectors
[TABLE]
where is the spring constant, the standard-deviation of the connector between connected monomers, is the Boltzmann’s constant and the temperature. The ensemble is composed of randomly chosen indices among the non-NN monomers. We define the connectivity fraction , as the fraction of connector numbers ,
[TABLE]
For each polymer realization, we choose pairs from the possible NN monomers. The dynamics of the resulting polymer monomers (vector ) is driven by Brownian motion and the field of force due to the potential energy 1, leading to the stochastic description
[TABLE]
where is the diffusion constant, is the friction coefficient, are independent white noise with mean 0 and variance 1, is the Rouse matrix Doi and Edwards (1986)
[TABLE]
For a given , the square symmetric matrix with random connectivity is defined by
[TABLE]
To derive the steady-state properties of an ensemble of RCL polymers, we adopt a mean-field model where we replace the matrix in Eq. 3 by its average (averaging over all configurations of non NN connected monomer pairs). We thus construct , using the probability density of the monomer connectivity. For a fixed number of connector , the probability that monomer has non-NN connections is obtained by choosing position in row of the matrix (excluding the super- and sub- and the diagonal), and the remaining connectors in any row or column :
[TABLE]
where the binomial coefficient is . This probability is the hyper-geometric distribution for the number of connections for monomer . The mean number of connectors for each monomer is therefore
[TABLE]
Using the mean values in 6, we obtain the expression for the matrix , with entries
[TABLE]
which can be decomposed as the sum
[TABLE]
where is the identity matrix, and \mbox{\boldmath1}_{N} is a matrix of ones. To study the mean properties of the RCL polymer, we study the stochastic process 3 using the average matrix .
Eigenvalues of the RCL polymer. To study the steady-state properties of system 3, we diagonalize the averaged connectivity matrix . Using Rouse normal coordinates Doi and Edwards (1986), defined as
[TABLE]
where
[TABLE]
is the Rouse orthonormal basis Doi and Edwards (1986), which diagonalizes :
[TABLE]
where
[TABLE]
are the eigenvalues of the Rouse matrix. We obtain from 3 the mean-field equations
[TABLE]
where \mbox{\boldmath\eta}=V\omega are independent white noises with mean 0 and variance 1. From 8, the matrix commutes with and therefore is diagonalizable using the same orthonormal basis :
[TABLE]
Using 8 and 14, we obtain the eigenvalues
[TABLE]
Finally, the eigenvalues of system 13 are the sum of eigenvalues of the Rouse matrix and :
[TABLE]
The system 13 is decoupled and consists of an ensemble of independent equations. For , we recover the Rouse polymer Doi and Edwards (1986), whereas for , we obtain a fully connected polymer, for which all eigenvalues equal to except for the first vanishing one. Using 16, the potential energy of the RCL polymer is written in the form
[TABLE]
The statistics of the RCL system (relation 3), can be recovered from 13 in the diagonalized form (expression 17), by scaling with the ratio of mean number of random connectors to the mean of total number of connectors:
[TABLE]
We plotted in Fig.1B the eigenvalues 16 for RCL polymers, for monomers, and =5, 25 and 50 added random connectors.
Encounter probability (EP) between monomers of the RCL polymer. The RCL polymer belongs to the class of generalized Gaussian chain models Sokolov (2003); Gurtovenko and Blumen (2005); Jespersen et al. (2000), for which the EP between any two monomers and at equilibrium is given by
[TABLE]
To compute expression 19, we estimate the variance in normal coordinates (Eq. 9):
[TABLE]
From the Ornstein-Uhlenbeck equations 13 Schuss (2009), we obtain the time-dependent variance of the normal coordinates
[TABLE]
We define the hierarchy of relaxation times , with
[TABLE]
where the slowest time corresponds to the diffusion of the center of mass. At steady-state,
[TABLE]
where
[TABLE]
Replacing 10 and 22 into Randomly cross-linked polymer models, we get
[TABLE]
For , the sum Randomly cross-linked polymer models is computed in the complex plane using the contour of the unit disk parameterized by
[TABLE]
where
[TABLE]
When , we recover the variance of the Rouse chain () Doi and Edwards (1986). The integrand in Randomly cross-linked polymer models is symmetric in and and has a pole of order at and simple poles at . Because , we have , which is outside of the unit disk , and for all N , . The pole is not inside the disk and does not contribute in the calculation of the residues of Randomly cross-linked polymer models. For , we obtain an exact expression for the variance
[TABLE]
For , the variance 25 is asymptotically given by
[TABLE]
Using 25 and 19, we obtain a novel expression for the steady-state encounter probability between any two monomers. We compare the EP obtained from Brownian simulations of RCL polymer for with the analytical formula 19 for connectors (Fig. 2(a)), which shows a very good agreement.
Mean square radius of gyration (MSRG) of the RCL polymer. The MSRG characterizes the size of the RCL polymer and can be computed from the variance 25 as
[TABLE]
Using 25 and 27, with the notations , we obtain
[TABLE]
When , we obtain the asymptotic expansion
[TABLE]
In Fig. 1(d), we compare computed from Brownian simulations of , and 100 monomers, with added random connectors, with the asymptotic formula 28 and both agrees.
Mean Square Displacement (MSD) of a single monomer of the RCL polymer. Using the normal coordinates 9 in dimension the MSD of monomers in the RCL polymer is
[TABLE]
where . Averaging over all monomers and approximating the sum in 30 by an integral for , we obtain
[TABLE]
where is the error function. Equation 31 characterizes the MSD for intermediate time scale . For short time scale , the MSD is given by
[TABLE]
For , the MSD behaves like
[TABLE]
We conclude that the homogeneous behavior of MSD for the RCL polymer model gives an anomalous exponent , similar to the Rouse model. Finally, for long time scales (), (slow diffusion of the polymer’s center of mass), the error function in 31 is almost constant and therefore
[TABLE]
Mean First Encounter Time (MFET) between monomers of the RCL polymer. We compute here the mean time for two monomers of the RCL polymer to enter for the first time in a ball of radius , at which they can possibly interact to form a chemical bond (Fig. 3(a)). The MFET for both the Rouse and beta Amitai and Holcman (2013) polymer were computed (see Amitai et al. (2012)) from the first eigenvalue of the Fokker-Planck operator associated to the stochastic equation 13, so that
[TABLE]
The first order approximation in is given by Amitai et al. (2012)
[TABLE]
where is the diagonalized potential 17, is the integral over the entire RCL configuration space
[TABLE]
The integral over in 36 is computed over the space of closed RCL polymer ensemble, with fixed connector between monomers and and additional random connectors. A direct computation gives
[TABLE]
Using relations 37 and 38 in 35, we obtain the MFET between any two monomers and of the RCL polymer for a given connectivity fraction in dimension :
[TABLE]
Using 26 into 39, we obtain the approximation
[TABLE]
where , and . The analytical formula 39 agrees with Brownian simulations of the MFET for RCL polymer (Eq. 3) with and 100 monomers, and added random connectors (Fig. 3(b)).
Applications of the RCL polymer model. We derived here several analytical formula for the steady-state variance, encounter probability, the radius of gyration, mean-square displacement and the mean first encounter time of the RCL polymer model. These formula can be used to extract parameters from chromatin conformation in CC experiments Dixon et al. (2012); Nora et al. (2012). In particular, using formula 19, it is possible to fit the empirical encounter probability obtained from experimental data to extract the connectivity fraction . This parameter has a direct interpretation and represents the mean number of cross-links, that can be mediated by CTCF molecules present in a genomic region. The parameter depend on the coarse-grained scale (see Shukron and Holcman (2017)). The extracted parameter can then be used to estimate the radius of gyration (Eq. 28) of any region of interest. This radius characterizes the size of the genomic region, at least relative to other genomic segments, hence providing insightful information about the local organization of the chromatin in the cell nucleus.
To demonstrate out methodology, we coarse-grained the 5C data reported in Nora et al. (2012) of male neuronal progenitors NPC-E14 cells, replicate 1, TAD H, containing 679 kbp, at a scale of 3kbp, resulting in monomers. We fit the EP (Eq.19) to 5C data of each of the 226 monomers and obtain the average connectivity , corresponding added connectors. We use the persistence length of . Substituting in Eq. 28, we compute the radius of gyration to be 43 nm for TAD H. Thus, the 679 kbp TAD H is compacted in a sphere of volume (2 bp per ).
The structural information extracted from the static CC maps using the RCL polymer model was recently used to interpret the dynamics of single particle trajectories (SPT) Gasser (2016); Lassadi et al. (2015); Amitai et al. (2017). By fitting the MSD (Eq. 31) to SPT data and by extracting the degree of connectivity , we interpret the mean deviation of the loci dynamic from pure diffusion as the confined dynamics of the loci in a cross-links genomic environment Shukron and Holcman (2017); Weber et al. (2010a, b). We provided a direct formula to extract , so that the simulations of Amitai et al. (2017) can now be bypassed. Finally, once the connectivity fraction is extracted, the mean first encounter time between any two monomers can be computed using formula 39. Encounter times are key for understanding processes, such as mammalian X chromosome inactivation Nora et al. (2012) or non-homologous-end joining after DNA double-strand break Amitai et al. (2017, 2012).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Nora et al. (2012) E. P. Nora, B. R. Lajoie, E. G. Schulz, L. Giorgetti, I. Okamoto, N. Servant, T. Piolot, N. L. van Berkum, J. Meisig, J. Sedat, et al. , Nature 485 , 381 (2012).
- 2Tark-Dame et al. (2014) M. Tark-Dame, H. Jerabek, E. M. Manders, D. W. Heermann, and R. van Driel, P Lo S Computational Biology 10 , e 1003877 (2014).
- 3Dekker et al. (2002) J. Dekker, K. Rippe, M. Dekker, and N. Kleckner, science 295 , 1306 (2002).
- 4Simonis et al. (2006) M. Simonis, P. Klous, E. Splinter, Y. Moshkin, R. Willemsen, E. De Wit, B. Van Steensel, and W. De Laat, Nature genetics 38 , 1348 (2006).
- 5Lieberman-Aiden et al. (2009) E. Lieberman-Aiden, N. L. Van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B. R. Lajoie, P. J. Sabo, M. O. Dorschner, et al. , science 326 , 289 (2009).
- 6Dixon et al. (2012) J. R. Dixon, S. Selvaraj, F. Yue, A. Kim, Y. Li, Y. Shen, M. Hu, J. S. Liu, and B. Ren, Nature 485 , 376 (2012).
- 7Doi and Edwards (1986) M. Doi and S. Edwards, The Theory of Polymer Dynamics Clarendon (Oxford, 1986).
- 8Shukron and Holcman (2017) O. Shukron and D. Holcman, PLOS Computational Biology 13 , e 1005469 (2017).
