Conformational Heterogeneity and FRET Data Interpretation for Dimensions of Unfolded Proteins
Jianhui Song, Gregory-Neal Gomes, Tongfei Shi, Claudiu C. Gradinaru,, and Hue Sun Chan

TL;DR
This paper develops a framework to interpret smFRET data for unfolded proteins, revealing that conformational heterogeneity can cause significant ambiguity in inferred dimensions, and addressing discrepancies between smFRET and SAXS measurements.
Contribution
It introduces a logical framework to quantify conformational heterogeneity effects on smFRET data interpretation for unfolded proteins.
Findings
smFRET data can be consistent with diverse conformational states
heterogeneity explains discrepancies between smFRET and SAXS results
additional experimental probes are necessary to resolve conformational ambiguity
Abstract
A mathematico-physically valid formulation is required to infer properties of disordered protein conformations from single-molecule F\"orster resonance energy transfer (smFRET). Conformational dimensions inferred by conventional approaches that presume a homogeneous conformational ensemble can be unphysical. When all possible---heterogeneous as well as homogeneous---conformational distributions are taken into account without prejudgement, a single value of average transfer efficiency between dyes at two chain ends is generally consistent with highly diverse, multiple values of the average radius of gyration . Here we utilize unbiased conformational statistics from a coarse-grained explicit-chain model to establish a general logical framework to quantify this fundamental ambiguity in smFRET inference. As an application, we address the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
July 31, 2017
**Conformational Heterogeneity and FRET Data
**
**Interpretation for Dimensions of Unfolded Proteins
**
Jianhui SONG,1,2 Gregory-Neal GOMES,3
Tongfei SHI,4 Claudiu C. GRADINARU,3 and Hue Sun CHAN2,∗
1 School of Polymer Science and Engineering, Qingdao University of
Science and Technology, 53 Zhengzhou Road, Qingdao 266042, China;
2 Departments of Biochemistry and Molecular Genetics,
University of Toronto, Toronto, Ontario M5S 1A8, Canada;
3 Department of Chemical and Physical Sciences,
University of Toronto Mississauga, Mississauga, Ontario L5L 1C6 Canada; and
Department of Physics, University of Toronto, Toronto, Ontario M5S 1A7, Canada;
4 State Key Laboratory of Polymer Physics and Chemistry,
Changchun Institute of Applied Chemistry, Chinese Academy of Sciences,
Changchun 130022, China
∗ Corresponding author.
∗ Hue Sun Chan. E-mail: [email protected]
To appear in Biophysical Journal, accepted for publication
———————————————————————————————————————
Submitted to Biophys J (arXiv: v1): May 16, 2017
First decision (Minor Revision): June 14, 2017
Revised manuscript submitted (arXiv: v2): July 26, 2017
Final acceptance (arXiv: v3 - content same as v2): July 31, 2017
**Abstract
**
A mathematico-physically valid formulation is required to infer properties of disordered protein conformations from single-molecule Förster resonance energy transfer (smFRET). Conformational dimensions inferred by conventional approaches that presume a homogeneous conformational ensemble can be unphysical. When all possible—heterogeneous as well as homogeneous—conformational distributions are taken into account without prejudgement, a single value of average transfer efficiency between dyes at two chain ends is generally consistent with highly diverse, multiple values of the average radius of gyration . Here we utilize unbiased conformational statistics from a coarse-grained explicit-chain model to establish a general logical framework to quantify this fundamental ambiguity in smFRET inference. As an application, we address the long-standing controversy regarding the denaturant dependence of of unfolded proteins, focusing on Protein L as an example. Conventional smFRET inference concluded that of unfolded Protein L is highly sensitive to [GuHCl], but data from small-angle X-ray scattering (SAXS) suggested a near-constant irrespective of [GuHCl]. Strikingly, the present analysis indicates that although the reported values for Protein L at [GuHCl] = 1 M and 7 M are very different at 0.75 and 0.45, respectively, the Bayesian distributions consistent with these two values overlap by as much as . Our findings suggest, in general, that the smFRET-SAXS discrepancy regarding unfolded protein dimensions likely arise from highly heterogeneous conformational ensembles at low or zero denaturant, and that additional experimental probes are needed to ascertain the nature of this heterogeneity.
**Introduction
**
Single-molecule Förster resonance energy transfer (smFRET) is an important, increasingly utilized experimental technique haran2012 ; schulerCOSB2013 ; gruebele2014 ; blanchard2014 ; rhoades2014 ; deniz2014 ; arash2015 ; rhoades2016 ; schulerAnnuRev2016 for studying protein disordered states, especially those of intrinsically disordered proteins (IDPs) uversky08 ; tompa12 ; julie13 ; zhirongRev14 ; cosb15 ; rohit2015 . Applications of smFRET to infer conformational dimensions of unfolded states of globular proteins haran2006 ; eaton2007 ; claudiu2016 and IDPs schuler10 ; baoxu2014 ; Songetal2015 ; claudiu2017 have provided insights into fundamental protein biophysics including, for example, folding stability and cooperativity fersht2009 ; chanetal2011 ; munoz2012 ; tobin_forcefields ; zhirong2016a , transition paths eaton2013 ; zhang12 , and compactness of IDP conformations baoxu2014 ; Songetal2015 involved in fuzzy complexes borg07 ; tanja2010 ; fuzzy12 ; Veronika2017 . Single-molecule conformational dimensions likely bear as well on biologically functional liquid-liquid IDP phase separation chongjulie2016 because the amino acid sequence-dependent single-chain compactness of charged IDPs muthu96 ; pappu13 ; kings2015 are predicted by theory linPRL to be closely correlated with these polyampholytic proteins’ tendency to undergo multiple-chain phase separation lin2017 .
Basically, inference from smFRET data on measures of conformational dimensions such as radius of gyration entails matching experimental average energy transfer efficiency with simulated (or analytically calculated) transfer efficiency predicted by a chosen polymer model. Using a Gaussian chain model or an augmented Sanchez mean-field theory, conventional smFRET inference procedures presume a homogeneous conformational ensemble that expands or contracts uniformly eaton2007 ; haranJACS2009 ; SchulerJCP2013 in response to changes in solvent conditions such as denaturant concentration kiefhaber2006 . Such an interpretation of smFRET data stipulated a significant collapse of unfolded-state conformations, as quantified by a substantial decrease in , upon changing solvent conditions from strongly unfolding to folding by lowering denaturant concentration haran2006 ; eaton2007 . This smFRET prediction has led to a long-standing puzzle for Protein L haran2012 ; tobinkevinJMB2012 ; tobinkevinPNAS2015 ; DanRohit2017 because for this two-state folder Plaxco1999 , an apparently more direct measurement of by small-angle X-ray scattering (SAXS) indicated that the average compactness of its unfolded-state conformational ensemble does not vary much with denaturant haran2012 ; tobinkevinJMB2012 . Similar behaviors have also been observed in SAXS experiments on other proteins tobin2004 .
Although the smFRET-SAXS puzzle remains to be fully resolved, several advances since the discrepancy was first noted haran2006 have contributed to clarifying the pertinent issues. A study using explicit-chain models questioned the general validity of conventional “standard” smFRET interpretation by showcasing that it incurs substantial errors in inferred dt2009 . A systematic analysis of subensembles of self-avoiding chains pinpointed the conventional procedure’s basic shortcoming in always presuming a homogeneous ensemble, an assumption positing particular forms of one-to-one mapping between average and end-to-end distance that lead to grossly overestimated ’s for small values Songetal2015 . In reality, however, as should be obvious from polymer theory and explicit-chain simulations of polymers, there is no general one-to-one mapping between and if a homogeneous ensemble is not assumed, because there are significant scatters in the – relationship (see, e.g., Fig. 2 of Ref. Songetal2015 ). Therefore, cannot be a proxy for in general. When conformational heterogeneity is recognized, as it is clearly observed in a number of smFRET experiments kellner2014 ; claudiu2016 , our subensemble analysis prescribes a “most probable” radius of gyration, , for any given Songetal2015 . The same analysis shows that can also correspond to the of a distribution of consistent with the given (Fig.5F of Ref. Songetal2015 ). When applied to an N-terminal IDP fragment of the Cdk inhibitor Sic1 borg07 ; tanja2010 ; Veronika2017 , the subensemble-inferred, denaturant-dependent is in good agreement with SAXS-determined and NMR measurement of hydrodynamic radius, in contrast to conventional procedures that produced unphysical results Songetal2015 .
In line with this conceptual framework that emphasizes conformational heterogeneity and polymer excluded volume, two other recent explicit-chain simulation studies also concluded that conventional smFRET inference of is inadequate Reddy2016 ; zhirong2016b . Notably, the coarse-grained model simulation in ref. Reddy2016 predicted an Å contraction of average for Protein L upon diluting GuHCl from 7.5 M to 1.0 M. The authors surmised that Å is “close to the statistical uncertainties” of SAXS-measured values, and therefore a resolution of the smFRET-SAXS discrepancy for Protein L might be within reach Reddy2016 . More recently, an extensive experimental-computational study of a destabilized mutant of spectrin domain R17 and the IDP ACTR also underscored the importance of explicit-chain simulations in the interpretation of smFRET data. Denaturant-dependent expansion of conformational dimensions was consistently observed for these proteins from multiple experimental methods as well as in all-atom explicit-water molecular dynamics simulations best2016 ; schuler2016 . Protein L, however, was not the subject of this investigation.
In view of recent results that apparently affirm an appreciable denaturant-dependent for unfolded proteins—albeit not as sharp as posited by conventional smFRET interpretation, is an essentially denaturant-independent unfolded-state as envisioned in the usual picture of cooperative protein folding tenable? To address this question, we determined computationally the distribution of consistent with any given and the derived probabilities that different ’s are consistent with the same ’s. Taking an agnostic view as to the merits of various experimental techniques, we invoked minimal theoretical assumption so as to let experimental data speak for themselves. For simplicity, we do not consider kinetic effects in smFRET measurements steinberg1978 ; nau2013 ; hilser2015 . Accordingly, our coarse-grained model incorporates only the most rudimentary geometry of polypeptide chains, without any detailed force field such as those applied in recent smFRET-related simulations dt2009 ; Reddy2016 ; best2016 . By this very construction, our analysis is unaffected by any known or potential limitations of current coarse-grained and atomic force fields cosb15 ; DavidShaw2 ; TaoPCCP ; sarah15 ; sarah17 ; best2017 ; shea2017 . As detailed below, we found that simple conformational statistics dictates a broad distribution of for most ’s. Among such conditional (Bayesian presse2017 ) distributions ’s for different values, large overlaps exist even for significantly different ’s. These results suggest that, even if published experimetal data are taken at face value, conceivably the smFRET-SAXS discrepancy can be resolved provided sufficient denaturant-dependent conformational heterogeneity in the unfolded state is encoded by the amino acid sequence of the protein. Our analysis thus establishes a physical perimeter within which future experimental and theoretical smFRET analyses may proceed.
Methods
The Cα protein model and the sampling algorithm used here are the same as that in our previous study Songetal2015 . The protein is represented by a sequence of beads connected by Cα–Cα virtual bonds of length Å. The potential energy , where , is the virtual bond angle at bead , is the reference that corresponds to the most populated virtual bond angle in the Protein Data Bank levitt1976 , is the Boltzmann constant, is the absolute temperature, is the model protein’s self-avoiding excluded-volume repulsion strength, and is the distance between beads , wherein is the position vector for bead . The excluded-volume term is set to zero for Å. As in many protein folding simulations chanetal2011 . we use a hard-core repulsion distance Å for most of the analysis presented below, while some results for Å or Å Songetal2015 are also utilized to assess the robustness of our conclusions.
We conducted Monte Carlo sampling by applying the Metropolis criterion MC at K using an algorithm described previously song13 that assigns equal a priori probability for pivot and kink jumps stockmayer1962 ; Lal69 . The acceptance rate for the attempted chain moves was . The first equibrating attempted moves of each simulation were excluded from the tabulation of statistics. Subsequently, moves were attempted for each chain length we studied to sample conformations for further analysis. Values of radius of gyration (where ) and end-to-end distance were computed for the sampled conformations to determine the distribution of populations centered at various with only narrow ranges of variations (bins) around the given and values.
We focus here only on cases in which the dyes are attached to the two ends of the protein chain. FRET efficiency for a given conformation in the model with end-to-end distance is then calculated by the formula
[TABLE]
where is the Förster radius of the dye. Based on the values of Å given by Sherman and Haran haran2006 and Å provided by Merchant et al. eaton2007 for the Alexa 488 and Alexa 594 dyes employed in their Protein L experiments, we set Å in most of the computation for Protein L below. For any given distribution , the average FRET efficiency is given by . The subscripts in the above expressions and are omitted hereafter for notational simplicity when the meaning of the average is clear from the textual context. Protein L is a 64-residue protein. To account for the added effective chain length due to the two dye linkers, we used chains to model the unfolded-state conformations of Protein L. This prescription for the linkers is similar to the ten plaxco2005 or eight eaton2007 extra residues used before. In addition to the exemplary computation for Protein L, simulations were also conducted for several other representative chain lengths (, , , and ) and Förster radii (, , and Å) for future applications to other disordered protein conformational ensembles.
**Results
**
Physicality of a subensemble approach to smFRET inference. To ensure that smFRET inference takes into account only physically realizable conformations, we recently indroduced a systematic methodology to infer a most probable radius of gyration from an experimental by considering subensembles of self-avoiding walk (SAW) conformations with narrow ranges of simulated using an explicit-chain model. For any such range (bin) centered around an , the method provides a conditional distribution for the end-to-end distance . An average FRET efficiency is then calculated. The most probable is determined by matching with , viz., by solving the equation
[TABLE]
for to arrive at (wherein the “exp” is dropped from the average), which is the inverse function of . As documented before Songetal2015 ; claudiu2016 and outlined above, by explicitly allowing for unfolded-state conformational heterogeneity—which is expected physically cosb15 ; rohit2015 , the subensemble SAW method circumvents the limitations of conventional smFRET inferences that presuppose a homogeneous conformational ensemble haran2006 ; eaton2007 ; SchulerJCP2013 .
Based on the same conceptual framework, here we approach the question of smFRET inference from a complementary angle. Instead of starting from subensembles with a narrow range of to derive , then and then , here we start from subensembles with a narrow range of (smallest bin size Å, see below), and hence a narrow variation of (i.e., via Eq. (1), the values in a narrow range may be taken as a single value), to derive distribution conditioned upon . While is related to by Bayes’ theorem, is of interest because it quantifies directly the possible variation in conformational dimensions when only a single value is known. This is because for every single FRET efficiency , the quantity is sufficient to provide the conditional distribution . Then, based on these derived distributions for all individual values, the distribution conditioned upon any value of averaged from any underlying distribution of can be readily obtained.
Estimation of conformational dimensions from FRET efficiency is highly model dependent because of insufficent structural constraint. As an exemplary case, we applied this formulation to Protein L. Figure 1 shows considerable discrepancies between SAXS- (squares) and smFRET-deduced (diamonds) ’s, and that different smFRET inference approaches lead to very different pictures of how of this protein varies with denaturant concentration. For a change in [GuHCl] from M to M, conventional inference (diamonds) yielded large decreases of Å (filled diamonds, ref. haran2006 ) or Å (open diamonds, ref. eaton2007 ). In contrast, subensemble SAW methods (circles) stipulate a much milder variation with respect to [GuHCl]. For the same [GuHCl] change, the most probable value decreases by Å (open circles) whereas the change in root-mean-square conditioned upon the published experimental data is even smaller: it decreases by Å (filled circles). When [GuHCl] is reduced further from 2 M to 0 M, the total decrease over the entire [GuHCl] range is Å for but merely Å for . We computed distributions of and here because these quantities are determined by SAXS tobin2004 ; saxs2015 . Our results are essentially unchanged if is considered instead (see below).
For every data point we considered for Protein L using subensemble analysis, significant diversity in values that are nonetheless consistent with the given is observed (Fig. 1, error bars for filled circles). In other words, the present method can infer the full Bayesian distribution of for a given and hence a rigorous error bar can be provided (whereas error bars are not provided for because it represents a narrow range of ’s that lead to a distribution of ’s which in turn average to an Songetal2015 ). Figure 1 shows clearly that the large variations in inferred values and the large overlaps of the ranges of these variations at different [GuHCl]’s imply that significant fractions of the unfolded conformational ensembles of Protein L at different [GuHCl]’s can encompass conformations with very similar ’s. Notably, the average expected of a fully unfolded protein in good solvent of the same length as Protein L with dye linkers (horizontal dashed line, ref. plaxco2004 ) is within the error bars for [GuHCl] as low as 3 M. Even at zero denaturant, the Å value (upper error bar), at one standard deviation from the mean, , is only Å from the average expected of a fully unfolded conformational ensemble.
Conformations consistent with a given FRET efficiency generally have highly diverse radii of gyration. The diversity in values that are consistent with a given (and therefore a given ) is further illustrated in Fig. 2. For our Protein L model, the square root of the standard deviation in , , is substantial for the entire range of : It increases steadily from Å for to Å for Å (Fig. 2b). Therefore, although of the conformations consistent with a given increases monotonically from to Å over the range in Fig. 2a, knowledge of alone can barely narrow down the wide range of possible values and vice versa (Fig. 2c–f).
A panoramic view of the logic of smFRET inference on conformational dimensions is provided by Fig. 3, wherein is converted to by Eq. (1). Using our model for unfolded Protein L as an example, the landscape in Fig. 3a shows clearly that the – scatter is wide, with the most populated (red) region elongated mainly along the axis with a small negative incline. Consistent with Fig. 1, this population distribution implies that even large variations in do not necessitate much change in the distribution. This feature of the – space is demonstrated more specifically by the curve in Fig. 3b (red solid curve; the dependence of on is essentially identical, blue solid curve), wherein an overwhelming majority of values are seen to be consistent with values between 20 Å and 27 Å that are within one standard deviation of (red dashed curves). In contrast, conventional smFRET inference procedures—which are demonstrably unphysical in some situations Songetal2015 —posit a much more sensitive dependence of inferred on (Fig. S1). It is noteworthy that, for most values, the variation of is milder than that of ; i.e., . In fact, this trend is already evident in Fig. 1 from the milder [GuHCl] dependence of (filled circles) than that of (open circles).
Conformations sharing similar radii of gyration can have very different FRET efficiencies. In light of the large diversity in values conditioned upon a given and the very mild variation of and with (Fig. 3), one expects that conformations consistent with even very different values share highly overlapping values. We now characterize this overlap quantitatively by first considering two sharply defined representative values in Fig. 4a (vertical bars depicting -function-like distributions) that correspond, by virtue of Eq. (1), to two sharply defined values and (Fig. 4b). These values are representative because they coincide with the experimental for Protein L at [GuHCl] = 7 M and 1 M, respectively haran2006 . The conditional distributions for and overlap significantly, with the overlapping area (Fig. 4c). By definition, this area is the overlapping coefficient, OVL, used in statistical analysis for measuring similarity between distribution ovl1989 . OVL between two distributions is generally given by
[TABLE]
where and are two normalized distributions of variable . The , distributions are and in Fig. 4c.
Because experimentally determined values are often averages, not sharply defined haran2006 ; eaton2007 , it is necessary to address more realistic distributions of on smFRET inference. We do so here by considering hypothetical broad Gaussian distributions for centered around the two sharply defined values (Fig. 4a, curves, standard deviation Å), resulting in broad distributions in averaging to and (Fig. 4b, curves), which are essentially equal to the sharply defined values of and . Modifying the two sharply defined values to two broad distributions of has very little impact on either the individual distributions [] or the overlap of the two distributions (Fig. 4d). The overlapping coefficient remains .
Although the distributions in Fig. 4c and 4d are very similar, there is a basic difference between two sharply defined values and two broad distributions of in regard to the conformations in the distributions. When the values are sharply defined, there is no overlap in the actual conformations in the two distributions because the conformational ensembles consistent with two sharply defined values are disjoint. However, when the two sets of values are broadly distributed with overlapping and values (Fig. 4a, b; curves), some of the conformations from the two different distributions that contribute to the overlapping region in Fig. 4d can be identical.
The distribution of radius of gyration consistent with a given single FRET efficiency is very similar to that consistent with a symmetric distribution of FRET efficiencies centered around it. This insensitivity of the distribution of (and therefore also of ) conditioned upon given values to variations in the width of Gaussian-like distribution of is not difficult to fathom. Given the mild variation of and with respect to (Fig. 3b) and the tendency for effects from values on opposite sides of the average of a symmetric distribution to cancel each other, averaging over a range of values centered around a given () is not expected to result in an overall average and overall distribution width that are substantially different from those for a sharply defined . For the sake of testing the robustness of this insensitivity, here we have used a large standard deviation, , for the hypothetical Gaussian distributions in Fig. 4a. This is equal to the standard deviation of the distribution for the full conformational ensemble (with the mean, Å). Beside the and distributions in Fig. 4, we performed additional calculations using Gaussian distributions of centered at different averages, with different standard deviations that equal , , , and . These constructs beget distributions of with different values. In all cases we considered, the resulting distribution for the given is essentially the same across the different standard deviations as well as for the case with a sharply defined . This finding suggests that the – dependence in Fig. 3b is not strictly limited to sharply defined values. An essentially identical relationship should also be is applicable to the and associated conditioned upon reasonably symmetric distributions of with mean value . In other words, in Fig. 3, which was originally constructed for sharply defined values, is also expected to be a good approximation of for essentially symmetric distributions of . More generally, the for any distribution of , symmetric or otherwise, can be calculated readily as by using the values from Fig. 3.
Inference of conformational dimensions solely from FRET efficiency can entail significant ambiguity. To ascertain more generally the degree to which the values consistent with different FRET efficiencies overlap, we extended the comparison in Fig. 4c for two values by computing the corresponding overlapping coefficients (Eq. (3)) for all possible pairs of FRET efficiencies, and :
[TABLE]
The heat map in Fig. 5 indicates substantial overlaps for a majority of . Among all possible combinations, more than 30% have OVL , and close to 60% have OVL (Fig. S2a), meaning that their ’s are quite similar. Notably, OVL increases significantly as increase above . We also computed averages of over the overlapping regime of the pairs of distributions. These averages represent conformational dimensions that are consistent with both and . In a majority of the situations, the root-mean-square for the overlapping regime stays within a relative narrow range of – Å for our model of unfolded Protein L , even for and that are quite far apart (Fig. S2b). Therefore, taken together with Figs. 1–4, the overview in Fig. 5 indicates that when an explicit-chain physical model is used to interpret/rationalize smFRET data Songetal2015 ; claudiu2016 , as is the case here, the a priori expectation is that even substantial changes in do not necessarily imply large changes in average . In this light, previous smFRET-based stipulations of large denaturant-dependent changes in the of Protein L haran2006 ; eaton2007 is demonstrably inconclusive in the absence of additional relevant experimental information, because they were based on conventional inference approaches that are not entirely physical Songetal2015 . Moreover, as is evident from the examples in Fig. 6, the trend of a mild – variation that we saw previously Songetal2015 and in Figs. 1–5 here, which is derived directly from explicit-chain polymer models, is expected to hold generally for other FRET systems of disordered proteins with different chain lengths and Förster radii as well.
**Discussion
**
Subensemble-derived conditional distributions of are basic to smFRET inference. To recapitulate, here we have further developed the subensemble SAW approach to smFRET inference of conformational dimensions Songetal2015 , which is based on the obvious principle that only physically realizable conformational ensembles should be invoked to interpret smFRET data. We focused previously on the most probable radius of gyration , which is derived from distributions of conditioned upon a narrow range of . Here we have considered the complementary quantity, , which is the root-mean-square value of conditioned upon a given . These quantities are not identical, but their variations with or are similar (Figs. 3 and 6). Relative to conventional approaches to smFRET inference, both and exhibit a milder dependence on smFRET efficiency, covering a range of values consistent with polymer physics Songetal2015 . By construction, is appropriate if it is known or presumed that the disordered conformations populate a narrow range of ’s or distribute symmetrically around an average Songetal2015 , whereas is suitable when such knowledge or assumption is absent. Therefore, it is our contention that, given a single in the absence of additional experimental data, the quantity should serve well as the physically valid Bayesian inference. However, if the ’s are known experimentally to be confined to a narrow range, which may be the case for certain IDPs, would be the valid inference when no further information besides and the confinement is available. The data provided in Fig. 6 and the Supporting Information of ref. Songetal2015 as well as those in the present Figs. 3 and 6 are useful for this purpose.
Physically valid interpretation of smFRET data requires explicit-chain modeling. Conventional approaches to smFRET inference neglects possible sequence-dependent conformational heterogeneity of unfolded ensembles. They always enforce a full conformational ensemble that expands or contracts homogeneously haran2006 ; eaton2007 . Lacking an explicit-chain representation, this elementary unphysicality of conventional smFRET inference was often overlooked. Consequently, when is small, these procedures force the entire ensemble to expand, leading to unrealistically high inferred values Songetal2015 . Although conformations with large (and hence small or ) and large are part of our subensemble analysis (e.g. Fig. 2f), these rare conformations in our simulations did not arise from physically unrealistic long Kuhn lengths or unrealistic intrachain repulsion as in conventional approaches Songetal2015 . This is the fundamental reason why conventionally inferred values differ from those simulated using physical, explicit-chain models claudiu2016 ; Songetal2015 ; dt2009 ; Reddy2016 ; zhirong2016b , and that such simulations, for Sic1 Songetal2015 and Protein L Reddy2016 for example, produced smaller variations in consistent with the limits prescribed by our subensemble SAW analysis Songetal2015 (Fig. S1).
In this perspective, recent computational investigations using explicit-chain simulations to rationalize smFRET data represent significant advances. These efforts include a study on Protein L using a denaturant-dependent construct based on a native-centric Gō-like sidechain potential Reddy2016 and an all-atom, explicit-water molecular dynamics study on ACTR and an R17 variant best2016 ; schuler2016 . In these studies, the conformational heterogeneity of unfolded/disordered ensembles encoded by amino acid sequences is taken into account either by a structure-specific Gō-like potential Reddy2016 or a transferrable atomic force field best2016 ; schuler2016 . However, it should be emphasized that commonly used force fields may not capture the high degrees of folding cooperativity observed for real proteins chanetal2011 . In particular, in comparison with experiment, the disordered conformational ensembles predicted by several atomic force fields are too compact tobin_forcefields ; DavidShaw2 ; sarah15 ; zhuqing2017 . Efforts to address this shortcoming is underway sarah17 ; best2017 ; shea2017 . For the case of Protein L, an earlier study TaoPCCP using a denaturant-dependent coarse-grained sidechain model similar to the one used in the recent study by Maity and Reddy Reddy2016 suggests that, even with an essentially native-centric potential, the model is insufficiently cooperative vis-à-vis experiment. Specifically, the predicted chevron plot for Protein L has a folding-arm rollover TaoPCCP , which is absent in experiment Plaxco1999 . This behavior is related to denaturant-dependent shifts in the positions of transition and unfolded states in the model TaoPCCP , which would likely lead to a reduction in with decreasing [GuHCl]. We view these known limitations of current potentials for protein folding simulation as part of the very puzzle underscored by the smFRET-SAXS discrepancy. The crux of the matter is, if the degrees of folding cooperativity for some—albeit not all—proteins, such as Protein L, are indeed as high as envisioned by SAXS measurements Plaxco1999 , why can’t common force fields capture the phenomenon TaoPCCP ?
In lieu of attempting to provide an accurate model of sequence-specific interactions, our subensemble SAW approach to smFRET inference does not presume any particular model of sequence-dependent conformational heterogeneity. By itself, our approach merely establishes a perimeter for physically realizable conformational variation Songetal2015 . The rationale is to let experiment take precedence in uncovering the actual conformational heterogeneity. In other words, is a baseline distribution upon which any re-weighting of conformational population by sequence-specific effects is to be considered without prejudgement. Under this conceptual framework, we make no generalization as to whether conformational dimensions of disordered proteins would or would not increase with increasing denaturant concentration. Such a verdict has to be made on a case-by-case basis depending on the nature of available experimental information in addition to the limited structural constraint provided by smFRET. For example, our previous study indicates that the dimensions of IDP Sic1 increases when [GuHCl] is increased from 1 M to 5 M Songetal2015 . A more recent in-depth study using smFRET, SAXS as well as other experimental probes and computation has demonstrated convincingly that conformational dimensions of the IDP ACTR and a destabilized mutant of globular protein R17 increase upon increasing [GuHCl] or [urea] best2016 ; schuler2016 . It is of relevance, however, that unlike Protein L Plaxco1999 , R17 is not a two-state folder as its chevron plot has a nonlinear unfolding arm internalfriction2012 .
A hypothetical scenario for the case of Protein L. To make conceptual progress toward understanding the Protein L unfolded state, we first put aside potential experimental artifacts that might be caused, for example, by the sensitivity of to the fitting range of the Guinier analysis and the difficulty in obtaining low-denaturant SAXS data schuler2016 . For the following consideration, we assume that the SAXS finding of an essentially denaturant-independent Å (ref. Plaxco1999 ) and the smFRET data of a decreasing with increasing denaturant haran2006 ; eaton2007 are both valid. We then seek to rationalize the experimental data by constructing denaturant-dependent heterogeneous conformational ensembles consistent with both sets of data. In so doing, we are merely following an investigative logic commonly practised in the construction of putative unfolded and IDP ensembles schuler2016 ; julie2001 ; marsh2012 ; antonov2016 . As explained below, a solution to the smFRET-SAXS puzzle is possible if, with decreasing denaturant, sequence-specific effects become increasing biased to re-distribute conformational population to high values such that a nearly constant Å is maintained despite the shift of the baseline Bayesian distribution to lower values because of increasing with decreasing denaturant (Fig. 4).
How biased does such a denaturant-dependent conformational heterogeneity need to be? Using the example in Fig. 4 for unfolded Protein L at [GuHCl] = 1 M and 7 M, an estimate of the necessary denaturant-dependent bias needed to resolve the smFRET-SAXS puzzle can be made. Consider the Bayesian distributions (Fig. 4c) and (Fig. 4d). These are baseline distributions that do not account for any sequence-specific effect. They show that and , respectively, of the and populations have Å ( Å2). This means that different subsets of these two conformational distributions can have the SAXS-observed Å. Indeed, possible sequence-specific re-weighted distributions for Protein L that are consistent with both smFRET and SAXS may take the forms of the shaded symmetric regions in Fig. 7 (grey, and pink plus grey areas). These distributions are consistent with both smFRET and SAXS because they both have Å (thus consistent with SAXS) yet ( at [GuHCl] = 1 M) for the grey distribution and ( at [GuHCl] = 7 M) for the pink plus grey distribution.
That this holds true is easy to see if the distributions in question are for two sharply defined ’s. In that case, we use the two ’s in Fig. 4c to define two restricted (unnormalized) distributions such that for Å2 and for Å2. Because of the mirror symmetry of these distributions with respect to Å, the values of their are both Å even though for all conformations in the distribution and for all conformations in the distribution. This result is generalizable to the two broad distributions in Fig. 4b. Consider . By definition this integral gives exactly the Å2 parts (in darker shades) of the grey, and pink plus grey areas in Fig. 7 because for Å2 and . The integral yields close approximations to the Å2 lighter shaded areas in Fig. 7 because varies mildly in the range (Fig.3b) that covers most of the distributions (Fig. 4b). This procedure ensures that the conformational populations represented by the grey plus pink and grey areas in Fig. 7 preserve their respective values because preserves the average at every . Therefore, the shaded distributions in Fig. 7 represent conformations with different and but possess the same Å. This hypothetical scenario indicates that consistency between SAXS and smFRET is possible if sequence-induced heterogeneity entails a mild restriction to of the conformational possibilities allowed by the at [GuHCl] = 7 M but imposes a more severe restriction to of the conformational possibilities allowed by the at [GuHCl] = 1 M (Fig. 7). It should be emphasized, however, that this is only one among many possible scenarios of denaturant-dependent conformational re-weighting that can satisfy both smFRET and SAXS data. Further information about the re-weighting may be offered by additional experimental data such as pair distributions from SAXS, but that is beyond the scope of this work.
The denaturant-dependent biases represented by the above estimates are intuitively plausible because the required biases of for [GuHCl] = 7 M 1M are not excessive. These fractional restrictions are only rough estimates, but they serve to illustrate a key concept. It is conceivable that the required restrictions can be less. For instance, when the atomic size and shapes of amino acid sidechains are taken into account, the actual intraprotein excluded volume effect can be stronger than that embodied by the Å repulsion distance in the Cα model. If Å is used instead Songetal2015 , the distribution would shift upward by – Å (Fig. S3). In that case, the fractions of with Å would increase, enabling significantly less severe denaturant-dependent biases of (for [GuHCl] = 7 M 1M) to resolve the smFRET-SAXS discrepancy (Fig. S4).
Concluding remarks. We deem this scenario for Protein L viable pending further experiment because natural proteins are heteropolymers, not homopolymers. Their amino acid sequences encode for heterogeneous intrachain interactions, especially under strongly folding (low or zero denaturant) conditions, which logically can only lead to heterogeneous conformational ensembles even when the chains are disordered. Unfolded conformations are not Gaussian chains topomer2005 . The question is not whether heterogeneity exists but the degree of heterogeneity and its impact. Such heterogeneity is observable by NMR baldwin1995 , in some cases even in high urea concentrations shortle2001 ; DanRohit2013 , not only for proteins such as BBL that do not fold cooperatively munoz2006 , but also for two-state folders (as defined by equality of van’t Hoff and calorimetric enthalpies of unfolding, and chevron plots with linear folding and unfolding arms chanetal2011 ; chanetal2004 ) such as cytochrome c bai1995 . The biophysics of protein folding processes that are macroscopically cooperative yet microscopically heterogeneous is readily understood theoretically shimizu2002 ; kaya2005 ; knott06 . From a mathematical standpoint, it is definitely possible, as we envisioned above, for heterogeneous conformational ensembles that are distinct from random coils or SAWs to have overall random-coil or SAW dimensions nonetheless Songetal2015 , as has been demonstrated by a recent study of the IDP Ash1 martin2016 and by hypothetical explicit-chain ensembles constructed to embody such properties rose2000 ; rose2004 . The scenario we suggested for resolving the smFRET-SAXS discrepancy for Protein L posits an increased population of transient loop-like disordered conformations with the two chain termini close to each other under native conditions. Is this feasible? Of relevance to this question is the experimental finding that conformations with enhanced populations of nonlocal contacts are involved in the folding kinetics of adenylate kinase haas2009 ; haas2014 ; haas2016 . Conformations with similar properties have also been suggested by theory to be favored along folding transition paths zhang12 . Recently, a disordered conformational state with such properties was identified for the protein drkN SH3 as well, though in this case it is induced by high rather than by low denaturant claudiu2016 . All in all, it is clear from the above considerations that denaturant-dependent heterogeneity in disordered protein conformational ensembles is expected in general. To what degree and in what manner it may account for the smFRET-SAXS discrepancy will have to be ascertained by further experiment.
Recently, Fuertes et al. [94] make an observation similar to ours—among other results of theirs—that the smFRET-SAXS puzzle may be resolved by recognizing that a given can be consistent with a variety of values. For the record, it is noted that one of the authors of this work lemke2017 kindly sent their manuscript (submitted but unpublished at the time) to us after we shared with him our paper on May 15, 2017 before submitting the original version of the present paper to this journal and making it publicly available on arXiv.org (arXiv:1705.06010).
**Supporting Material
**Supporting Information comprises four supporting figures is available at the Biophysical Journal website.
**Author Contributions
**J.S. and H.S.C. designed the research. J.S., G.-N.G. and H.S.C. performed the research. J.S., G.-N.G., C.C.G. and H.S.C. analyzed the data. T.S. contributed computational tools. J.S. and H.S.C. wrote the paper.
**Acknowledgments
**H.S.C. thanks Osman Bilsel, Kingshuk Ghosh, Elisha Haas, Rohit Pappu, and Tobin Sosnick for helpful discussions during Protein Folding Consortium workshops sponsored by the National Science Foundation (US), and Eitan Lerner for comments on an earlier version of this paper. J.S. gratefully acknowledges support from the National Natural Science Foundation of China (Grant No. 21674055) and the Open Research Fund of State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences (Grant No. 201613). G.-N.G. was supported by an Ontario Graduate Scholarship. Support for this work was also provided by Natural Science and Engineering Research Council of Canada Discovery Grant RGPIN 342295-12 to C.C.G., Canadian Institutes of Health Research Operating Grant No. MOP-84281 to H.S.C., and generous allotments of computational resources from SciNet of Compute/Calcul Canada.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Haran, G. 2012 How, when and why protein collapse: The relation to folding. Curr. Opin. Struct. Biol. 22:14–20.
- 2(2) Schuler, B., and H. Hofmann. 2013. Single-molecule spectroscopy of protein folding dynamics—expanding scope and timescales. Curr. Opin. Struct. Biol. 23:36–47.
- 3(3) Gelman, H., and M. Gruebele. 2014. Fast protein folding kinetics. Q. Rev. Biophys. 47:95–142.
- 4(4) Juette, M. F., D. S. Terry, M. R. Wasserman, Z. Zhou, R. B. Altman, Q. Zheng, and S. C. Blanchard. 2014. The bright future of single-molecule fluorescence imaging. Curr. Opin. Struct. Biol. 20:103–111.
- 5(5) Elbaum-Garfinkle, S., G. Cobb, J. T. Compton, X.-H. Li, and E. Rhoades. 2014. Tau mutants bind tubulin heterodimers with enhanced affinity. Proc. Natl. Acad. Sci. USA 111:6311–6316.
- 6(6) Banerjee, P. R., and A. A. Deniz. 2014. Shedding light on protein folding landscapes by single-molecule fluorescence. Chem. Soc. Rev. 43:1172–1188.
- 7(7) König, K., A. Zarrine-Afsar, M. Aznauryan, A. Soranno, B. Wunderlich, F. Dingfelder, J. C. Stüber, A. Plückthun, D. Nettels, and B. Schuler. 2015. Single-molecule spectroscopy of protein conformational dynamics in live eukaryotic cells. Nature Methods 12:773–779.
- 8(8) Melo, A. M., J. Coraor, G. Alpha-Cobb, S. Elbaum-Garfinkle, A. Nath, and E. Rhoades. 2016. A functional role for intrinsic disorder in the tau-tubulin complex. Proc. Natl. Acad. Sci. USA 113:14336–14341.
