A simple method for mapping the location of cross-β-forming regions within protein domains of low sequence complexity
Jinge Gu, Xiaoming Zhou, Lillian Sutherland, Glen Liszczak, Steven L. McKnight

TL;DR
This paper introduces a method to identify regions in proteins that can self-associate and form structures important for cellular organization.
Contribution
A new method is introduced to pinpoint self-associating regions in low complexity protein domains.
Findings
Cross-β-prone regions suppress fluorescence when attached to the C-terminus of GFP.
A 20 amino acid sequence in TDP-43's LCD is essential for self-association and phase separation.
The method can identify self-associating regions in low complexity domains.
Abstract
This study describes a molecular biological method for analyzing protein domains of low sequence complexity in search of segments that mediate self-association and consequent phase separation both in vitro and in vivo. Small regions allowing for self-association correspond to sequences that specify the formation of labile cross-β structural order. When juxtaposed to the C-terminus of GFP, cross-β-prone regions suppress fluorescence. A tiled scan of overlapping fragments of the low complexity domain (LCD) of the TDP-43 RNA-binding protein pinpointed an evolutionarily conserved sequence of 20 amino acids essential for self-association, phase separation, and the formation of nuclear speckles. The screening method described herein should be useful for the analysis of any LCD believed to function via homotypic self-association. Protein domains of low sequence complexity are unable to fold…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7- —HHS | NIH | National Institute of General Medical Sciences (NIGMS)100000057
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA Research and Splicing · Protein Structure and Dynamics · RNA modifications and cancer
Most proteins encoded by the genes of all life forms function by adopting a unique conformation of structural order. Three-dimensional structural order is encoded by the linear sequence of amino acids of a protein. This biological truth was confirmed by the unfolding and folding experiments of Anfinsen and colleagues using ribonuclease (1).
Unusual protein sequences incapable of folding into stable structures began to be described three to four decades ago. Early examples of these strange proteins were reported for the activation domains of the Gal4 and VP16 transcription factors (2, 3). The amino acid sequences of the Gal4 and VP16 transcriptional activation domains were unusual in being composed of a limited number of the twenty residues utilized for the folding of normal proteins. Despite being able to perform a distinct biological function evolutionarily crafted to stimulate transcription, these unusual protein domains were incapable of folding into stable molecular structures.
In the ensuing decades it became clear that the proteomes of eukaryotic organisms contain thousands of these protein domains of low sequence complexity (4???–8). Such regions have variously been referred to as low complexity domains (LCDs), intrinsically disordered domains (IDRs), or prion-like domains (PLDs). Extensive studies have confirmed that most of these LCDs are either incapable of folding into stable, three-dimensional structure, or do so conditionally only upon interaction with other, structurally ordered macromolecules (9). Upward of a quarter of all eukaryotic proteins, or extensive subsegments thereof, are now recognized to be of low sequence complexity.
How do these unusual proteins perform their biological function? Two experimental methods have given evidence of LCD self-association. As described between one and two decades ago, the phenylalanine-glycine rich LCDs of nucleoporin proteins, and the tyrosine-glycine rich LCD of the fused-in-sarcoma (FUS) RNA-binding protein, are capable of phase separating into a hydrogel-like state (10???–14). This process entails the formation of labile, cross-β polymers. A simplified test for LCD self-association can be measured by polymerization-dependent enhancement of fluorescence by thioflavin-T (ThT). LCD self-association can also be monitored by the formation of spherical protein droplets in an assay designated liquid-liquid phase separation (LLPS). The latter form of phase separation was first described for the LCDs of the same RNA-binding proteins that had initially been observed to form hydrogels (15??–18).
The relationship between the hydrogel and LLPS forms of phase separation has been interrogated in studies of the FUS, hnRNPA1, hnRNPA2, and TDP-43 RNA-binding proteins, as well as the unstructured head domains of a variety of intermediate filament proteins (19?????–25). Reductionist dissection by unbiased mutagenesis, in vivo and in vitro protein footprinting, segmental isotope labeling followed by NMR spectroscopy, and systematic chemical modification of the polypeptide backbone have established that both forms of phase separation are enabled by labile cross-β interactions localized to relatively small regions of notable evolutionary conservation.
It has further been observed that both the hydrogel and LLPS states of phase separation are dissolved more readily by 1,6-hexanediol (1,6-HD) than 2,5-hexanediol (2,5-HD), a regioisomer sharing the same chemical formula (26). Two facts underscore the importance of the differential effects of aliphatic alcohols on LCD phase separation. First, over the past decade correlatively differential effects of the two chemical probes have been made in studies of hundreds of forms of dynamic cell organization (see Discussion). Invariably, 1,6-HD melts cytoplasmic and nuclear condensates more readily than the 2,5-HD regioisomer. Second, aliphatic alcohols manifest their dissociative activity by binding to carbonyl oxygen atoms of the polypeptide backbone (27, 28). The energetics enabling cross-β interactions are primarily mediated by peptide NH groups of the backbone of one polypeptide strand hydrogen bonding to carbonyl oxygen atoms of the paired strand (29). As such, one can readily understand the chemical basis by which aliphatic alcohols reverse the phase separated state and melt intracellular condensates. These observations represent generalizable facts describing a form of labile macromolecular interaction used broadly throughout the nuclear and cytoplasmic compartments of eukaryotic cells.
The small regions specifying LCD self-association have, in several cases, been found to colocalize with mutations causative of human disease. Disease-causing mutations proximal to cross-β-forming regions within LCDs specify the synthesis of altered proteins that aberrantly strengthen the structurally ordered state (20, 22, 23). Such observations give evidence that the biological utility of this form of protein function involves a type of weak intermolecular self-association poised at the threshold of thermodynamic equilibrium. Various forms of posttranslational modification also map within the cross-β-forming regions of LCDs and serve to regulate the balance of self-association (21, 22, 30, 31). That 75% of all forms of posttranslational modification map to the subfraction of the proteome composed of LCDs (32, 33), points to the expectation that many forms of cell regulation will be mediated by chemical modifications of LCDs that influence the balance of self-association.
Perhaps the clearest description of how LCD self-association abets the dynamics of cell morphology has come from studies of the intrinsically disordered head domains of intermediate filament (IF) proteins. IF head domains self-associate in the form of weakly annealed cross-β structures (22, 23, 34). A beautiful cryo-EM tomography study of assembled vimentin IFs has revealed the periodically phased coalescence of 40 head domains into a luminal cavity (35). It is within this cavity that IF head domains self-adhere to facilitate filament assembly. The orthogonal experimental findings leading to this understanding of IF architecture offer a refreshingly simple description of how head domain phosphorylation triggers filament disassembly and the rounding up of cells during mitosis (36).
A comprehensive understanding of how our thousands of LCDs participate in the assembly and regulation of malleable cellular structures will require machine learning-assisted identification of the localized regions within LCDs that mediate transient annealing. Instead of being limited to a handful of tediously mapped LCD sequences, experimentalists need to provide computational scientists with hundreds of accurately defined cross-β-forming regions within LCDs.
To this end we have turned to methods described by Hecht and colleagues who have shown that the amyloidogenic Aβ polypeptide suppresses fluorescence upon fusion to the C-terminus of green fluorescent protein (37). Expanding from the teachings of Hecht, we describe a technically simple method for mapping the location of the cross-β-forming region that mediates self-association, phase separation, and biological function of the LCD associated with the TAR DNA-binding protein 43 (TDP-43). Aside from assisting in the development of computational methods required to evolve studies of LCDs from lab bench to computer, the methods described herein will also be of use for the functional analysis of intrinsically disordered protein domains.
Results
The LCD of the TDP-43 RNA-binding protein extends from proline residue 262 to the carboxyl terminus of the protein, ending with methionine residue 414. An evolutionarily conserved region of roughly 20 amino acids located between residues 321 and 340 has been found to mediate interstrand self-association via the formation of a labile cross-β molecular structure (23, 27, 30). This form of protein self-association is enabled by the zippering of a contiguous network of interstrand hydrogen bonds between peptide backbone amino groups and carbonyl oxygen atoms (29).
The experimental path leading to these conclusions has been slow and tedious. In efforts to develop more facile methods for mapping the small, functionally critical regions of LCDs required for self-association and biological function, we used the GFP-suppression method of Hecht and colleagues (37) to interrogate the TDP-43 LCD. We systematically scanned the TDP-43 LCD by fusing individual, 30 residue fragments to the C-terminus of GFP (Fig. 1A). Plasmids encoding twelve GFP:TDP-43 fusion proteins were introduced into bacterial cells and subjected to conditions allowing for induced protein expression (Materials and Methods).
Fusion of the cross-β core of the TDP-43 low complexity domain (LCD) onto the carboxyl terminus of GFP impedes fluorescence. (A) Schematic diagram of the LCD of TDP-43 spanning residues 262 through 414. Location of the cross-β core responsible for self-association and phase separation is shown as a blue box. Overlapping segments of 30 residues were individually appended onto the carboxyl terminus of GFP and expressed in bacterial cells (Materials and Methods). (B) Amino acid sequence of the TDP-43 LCD with cross-β core highlighted in yellow. (C) Spectrophotometric measurements of GFP fluorescence of soluble lysates prepared from bacteria induced to express GFP fusion proteins. (D) Coomassie-stained SDS polyacrylamide gel used to size and visualize bacterially expressed GFP fusion proteins.
Cell lysates were evaluated by SDS polyacrylamide gel electrophoresis (SDS-PAGE) to monitor protein expression, and spectrophotometry to measure GFP fluorescence (Fig. 1 C and D). As shown in Fig. 1C, all twelve of the GFP fusion proteins were expressed at equivalent levels as determined by SDS-PAGE analysis of bacterial lysates. Roughly equivalent levels of GFP fluorescence were observed for ten of the fusion proteins (Fig. 1C). By contrast, the protein linking residues 311 to 342 (fragment 6, F6) of the TDP-43 LCD to the C-terminus of GFP suffered a substantial reduction in fluorescence, and the fusion protein linking GFP to residues 321 to 351 (fragment 7, F7) exhibited a modest reduction in GFP fluorescence.
The 20-residue overlap of the latter two segments corresponds to residues 321 to 342 of the TDP-43 LCD. As reported in earlier studies, this span of twenty amino acids is the most evolutionarily conserved region of the TDP-43 LCD (30). Likewise, this GFP-inhibiting region corresponds with the protein sequences essential for formation of both phase-separated, liquid-like droplets and labile cross-β polymers (23, 27, 30).
A Randomized Screen to Identify GFP-Inhibiting Fragments within the TDP-43 LCD.
Observations shown in Fig. 1 give evidence that fusion of either of two, 30-residue segments of the TDP-43 LCD to the carboxyl terminus of GFP interferes with fluorescence. Reasoning that this might form the basis for a functional screen in bacteria, a plasmid library was assembled wherein synthetic oligonucleotides corresponding to 30-residue segments of the TDP-43 LCD were shotgun cloned downstream of the coding sequence of GFP (Fig. 2A). Competent BL21(DE3) bacterial cells were transformed with the library and spread on agarose plates supplemented with IPTG. Visual inspection of the culture plates under UV light revealed a mixture of bright green and dim green colonies (Fig. 2B). 370 colonies, scored as being either bright or dim green in color, were picked and sequenced.
Analysis of bacterial colonies expressing GFP fused to 30 residue fragments of the TDP-43 LCD. (A) Schematic diagram of the LCD of TDP-43 spanning residues 262 through 414. Location of the cross-β-forming region responsible for self-association and phase separation is shown as a blue box. (B) Twelve, double-stranded synthetic oligonucleotides encoding 30 residue fragments of the TDP-43 LCD were mixed and shotgun cloned immediately downstream from the C-terminus of GFP. The plasmid library was transformed into bacterial cells that were grown on agarose plates in the presence of 40 μM IPTG at 37 °C to induce GFP expression. Colonies scored as either bright green or dim green were picked and sequenced from plates illuminated by UV light. (C) Quantification of bright green (above) or dim green (below) colonies containing designated fragments of the TDP-43 LCD.
The histograms of Fig. 2C show that no bright green colonies were found to contain sequences encoding the evolutionarily conserved, cross-β core of the TDP-43 LCD (as defined by oligonucleotides encoding residues 311 to 342 or 321 to 351). All colonies containing these two sequences, hereafter designated as fragments 6 and 7 of the LCD, had been scored as being dim green. Of the remaining, 30-residue segments of the TDP-43 LCD, all were found by unbiased screening and sequencing to have been derived either exclusively from bright green colonies, or from a mixture of both bright and dim green colonies.
Mutational Analysis of GFP-Inhibiting Fragments of the TDP-43 LCD.
As a test of the possible involvement of β-strand structural conformation in the inhibition of GFP fluorescence, bacterial expression vectors linking either fragment 6 or fragment 7 of the TDP-43 LCD to the C-terminus of GFP were altered by site-directed mutagenesis. Starting with phenylalanine residue 313 for fragment 6, or alanine residue 321 for fragment 7, pairs of residues separated by a single amino acid were changed to proline to systematically mutagenize each fragment. Proline substitutions were chosen because proline is the sole amino acid that does not contribute an NH group to the polypeptide backbone. As such, it disfavors the formation of annealed β-sheets.
The sequences of these double proline variants are shown in Fig. 3A. Each variant plasmid was introduced into bacterial cells that were subsequently exposed to IPTG as a means of inducing expression of the encoded GFP fusion protein (Materials and Methods). SDS-PAGE analysis of bacterial lysates confirmed that all mutational variants were expressed at equivalent levels (SI Appendix, Fig. S1).
Effects of proline substitutions upon the suppressive activities of fragments 6 and 7 of the TDP-43 LCD on GFP fluorescence. (A) Sequences of double proline substitution variants of fragments 6 (Above) and 7 (Below) of the TDP-43 LCD. Sequences highlighted in yellow correspond to the cross-β-forming region. (B) Measured levels of GFP fluorescence for each double proline variant of fragment 6 (F6, Left) and fragment 7 (F7, Right) of the TDP-43 LCD as assayed in living bacterial cells. Cultures of bacterial strains expressing each variant were evaluated as a linear function of turbidity (OD600nm) to measure cell density, and intensity of GFP fluorescence (SI Appendix, Fig. S2). Quantitative comparisons of slopes of increasing OD600nm versus increasing GFP intensity, as measured for four independent cultures of each variant, yielded histogram patterns for double proline variants of F6 (Left) and F7 (Right). Mean ± SD; n = 4.
Four separate samples of IPTG-induced cells were recovered for each variant and measured for GFP fluorescence (Fig. 3B and SI Appendix, Fig. S2). The histograms of Fig. 3B (Left) reveal the levels of GFP fluorescence observed for each double proline variant introduced into fragment 6 of the TDP-43 LCD. The first two variants, designated F6-P1 and F6-P2, exhibited levels of GFP fluorescence no higher than bacterial cells expressing GFP fused to the native sequence of fragment 6. By contrast, all of the remaining double proline variants of fragment 6 yielded significantly higher levels of GFP fluorescence. We conclude that the latter variants prevent fragment 6 of the TDP-43 LCD from suppressing GFP fluorescence.
The histograms of Fig. 3B (Right) reveal the distributions of GFP fluorescence observed in bacterial cells expressing each of the double proline variants introduced into fragment 7 of the TDP-43 LCD. The first five variants introduced into fragment 7, upon IPTG-induced expression in bacterial cells, exhibited significant GFP fluorescence. Variants bearing double proline mutations close to the C-terminal side of fragment 7 produced lower levels of GFP fluorescence. The variants designated F7-P6, F7-P7, and F7-P8 exhibited no higher level of fluorescence than bacterial cells expressing GFP fused to the native sequence of fragment 7. We conclude that variants F7-P1 through F7-P5 interfered with the ability of fragment 7 of the TDP-43 LCD to suppress GFP fluorescence.
The combined data of Fig. 3 reveal a contiguous segment of 20 amino acids, spanning alanine residue 321 to leucine residue 340, wherein introduction of double proline variations interfered with the ability of either fragment 6 or fragment 7 of the TDP-43 LCD to suppress GFP fluorescence in bacterial cells. These 20 amino acids represent the most evolutionarily conserved region of the TDP-43 LCD (30). They further correspond to the cross-β-forming region essential for formation of phase separated liquid-like droplets and labile cross-β polymers (23, 27, 30).
Analysis of Proline Mutations within the TDP-43 LCD upon Self-Association and Phase Separation.
Having observed double proline variants that either do or do not affect the ability of the TDP-43 LCD to suppress GFP fluorescence in bacterial cells, we next sought to test for possible correlative effects of these mutational variants on two forms of phase separation. Incubation of the isolated LCD of TDP-43 under conditions of slightly acidic pH and physiologically normal monovalent salts leads to the formation of liquid-like droplets (23, 27, 30). Protein samples of either the native LCD of TDP-43, or variants bearing two proline substitutions, were purified and assayed under conditions allowing formation of liquid-like droplets. As shown in Fig. 4, six of the double proline variants were compromised in the formation of liquid-like droplets when tested under assay conditions of 150 mM NaCl and pH 6.5. Less severe impediments to droplet formation were observed upon assay at pH 6.8. Under such conditions, droplet formation by the F6-P2, F6-P6, and F7-P6 variants was reduced by only 2-fold relative to the native LCD of TDP-43. The double proline variants observed to impede droplet formation coincided with those that overcame GFP suppression as described in Fig. 3. Likewise, double proline variants that did not overcome GFP suppression in live bacterial cells, including F6-P1 and F7-P7, also failed to impede the formation of liquid-like droplets.
Effects of proline substitutions upon formation of phase separated liquid-like droplets formed by the TDP-43 LCD. (A) Purified samples of the intact LCD of TDP-43, or proline-substituted variants thereof, were tested for phase separation under assay conditions of 150 mM NaCl and pH levels of either 6.8 or 6.5. Locations of proline substitutions are shown on the sequence of the TDP-43 LCD between phenylalanine residue 313 and serine residue 347. Sequence highlighted in yellow corresponds to the cross-β-forming region. (B) Microscopic images of protein samples incubated under conditions permissive of phase separation revealed the presence, attenuation, or absence of liquid-like droplets. (Scale bar, 25 μm.) (C) Optical density measurements are shown for each protein sample as measured at pH 6.8 (Left) or pH 6.5 (Right). Mean ± SD; n = 5.
The same protein samples evaluated for formation of phase separated liquid-like droplets were also tested in biochemical assays of the formation of labile cross-β polymers (Fig. 5). Samples prepared at a protein concentration of 20 μM were incubated in the presence of thioflavin-T (Materials and Methods). Acquisition of enhanced ThT fluorescence was monitored over a period of 30 h. Time-dependent enhancement of ThT fluorescence was readily observed for the native protein, the F6-P1 variant, and the F7-P7 variant. Either delayed or intermediate levels of ThT fluorescence were observed for the F6-P2 and F7-P6 variants. Little or no evidence of time-dependent increase in ThT fluorescence was observed for the F6-P3, F6-P4, F6-P5, F6-P6, or F6-P7 variants.
Effects of proline substitutions upon formation of labile cross-β polymers formed by the TDP-43 LCD. (A) Purified samples of the intact LCD of TDP-43, or proline-substituted variants thereof, were tested for the formation of cross-β polymers as revealed by the enhanced fluorescence of thioflavin-T (ThT). Locations of proline substitutions are shown on the sequence of the TDP-43 LCD between phenylalanine residue 313 and serine residue 347. Sequence highlighted in yellow corresponds to the cross-β-forming region. (B) Levels of ThT fluorescence were measured by spectroscopy as a function of time for protein samples corresponding to the native LCD of TDP-43 (WT) or each of nine double proline substitution variants. Mean ± SEM; n > 4.
Obviously correlative effects of the double proline variants of the TDP-43 LCD were observed for their ability to overcome inhibition of GFP fluorescence by fragments 6 and 7 as assayed in bacterial cells (Fig. 3), for their inhibitory effects upon the formation of liquid-like droplets as observed in test tube assays using purified proteins (Fig. 4), and for their inhibitory effects upon assembly of the purified TDP-43 LCD into ThT-positive, cross-β polymers (Fig. 5). These data further correlate with past studies showing that both liquid-like droplets and ThT-positive polymers of the TDP-43 LCD are dissolved more effectively by 1,6-hexanediol than its 2,5-hexanediol regioisomer (27).
Analysis of Contiguously Localized Proline Mutations within the TDP-43 LCD on the Formation of Nuclear Speckles.
The TDP-43 RNA-binding protein resides primarily within the nuclear compartment of mammalian cells where it forms nuclear speckles (38). In order to determine whether self-association of the TDP-43 LCD might be required for its assembly into nuclear speckles, each double proline variant evaluated for phase separation (Fig. 4), and formation of labile cross-β polymers (Fig. 5), was introduced into lentivirus as a fusion protein linking GFP to the N terminus of the full-length TDP-43 protein. Viral infection allowed the GFP:TDP-43 fusion proteins to be conditionally expressed in HCT116 cells after doxycycline-mediated induction (Materials and Methods). Western blotting assays revealed that the nine variants were expressed at the same level as the fusion protein linking GFP to the native TDP-43 protein (SI Appendix, Fig. S3). The closely matched levels of expression of the various GFP:TDP-43 fusion proteins were approximately equivalent to that of endogenous TDP-43 within HCT116 cells.
Confocal microscopy was used to evaluate the morphological distribution of TDP-43 nuclear speckles. Cells expressing a GFP fusion protein linked to the native TDP-43 protein revealed roughly 30 GFP-positive speckles per nucleus (Fig. 6). Cells expressing three of the double proline variants, designated F6-P3, F6-P4, and F6-P5, revealed a statistically significant reduction of roughly 75% in the number of GFP-positive nuclear speckles. Cells expressing the F6-P6 and F6-P7 variants revealed reductions of 40 to 50% in the number of GFP-positive nuclear speckles. Finally, cells expressing the F6-P1, F6-P2, F7-P6, and F7-P7 variants showed no statistically significant reduction in GFP-positive nuclear speckles relative to the number formed by the native TDP-43 protein.
*Effects of proline substitutions upon condensation of TDP-43 into nuclear speckles in cultured human cells. (A) GFP was fused onto the N terminus of the full-length TDP-43 protein containing either the native sequence of its low complexity domain (WT) or double proline variants thereof. Sequence highlighted in yellow corresponds to the cross-β-forming region. (B) Coding segments for each construct were moved from bacterial plasmids to lentivirus and used to infect HCT116 cells. Cell lines expressing each construct were selected for lentivirus-specified resistance to hygromycin (Materials and Methods), grown on cover slips and analyzed for nuclear GFP fluorescence by confocal microscopy. (Scale bar, 2 μm.) (C) 40 individual nuclei were analyzed for each cell line to quantify the number of GFP-positive nuclear speckles per cell. Mean ± SD; **P < 0.001; ns, no significance; one-way ANOVA.
In summary, correlatively similar patterns of effect of double proline mutations were observed for: i) relief of fragment 6- and fragment 7-mediated suppression of GFP fluorescence (Fig. 3); ii) formation of phase separated liquid-like droplets (Fig. 4); iii) formation of labile cross-β polymers (Fig. 5); and iv) assembly of TDP-43 into GFP-positive nuclear speckles in cultured HCT116 cells (Fig. 6). In combination, these data confirm that the TDP-43 LCD achieves self-association via the formation of a labile cross-β structure localized to an evolutionarily conserved region of roughly twenty amino acid residues.
Two Related Aliphatic Alcohols Differentially Melt Nuclear Speckles Formed by TDP-43.
The GFP-suppressing segment of the TDP-43 LCD colocalizes with a region where differences in protein binding by two related aliphatic alcohols have been mapped. Solution NMR studies have shown that 1,6-hexanediol (1,6-HD) causes significantly more pronounced chemical shifts to backbone carbonyl oxygen atoms than 2,5-hexanediol (2,5-HD) in the region of the TDP-43 LCD bracketed by alanine residue 321 and leucine residue 340 (27).
Differences in the capacity for sequestration of backbone hydrogen bond acceptors explain the enhanced potency of 1,6-HD, relative to 2,5-HD, for melting phase separated liquid-like droplets and labile cross-β polymers formed by the TDP-43 LCD (27). This form of protein:protein interaction requires the zippering of a contiguous set of interstrand hydrogen bonds between peptide amino groups and carbonyl oxygens (29). When the terminal alcohol groups of 1,6-HD become hydrogen bonded to backbone carbonyl oxygen atoms within this evolutionarily conserved region of the protein, self-association is chemically inhibited.
Knowing that aliphatic alcohols bind to carbonyl oxygen atoms of the polypeptide backbone of LCDs (27, 28), our mechanistic interpretation of TDP-43 self-association predicts that these pharmacological agents should melt nuclear speckles formed by TDP-43. This expectation is based upon the assumption that the partitioning of TDP-43 into intranuclear condensates is dependent upon the ability of its LCD to self-associate.
Indeed, this concept further predicts that nuclear speckles containing the TDP-43 protein should be dissolved more readily by 1,6-HD than its 2,5-HD regioisomer. Differences in the dissociative activities of the two aliphatic alcohols have been observed in living cells for a variety of nuclear and cytoplasmic puncta not surrounded by investing membranes, as well as for the integrity of five different types of intermediate filaments (26). That 1,6-HD is more effective in melting self-associated LCDs than 2,5-HD has been attributed to geometrical differences in the positions of the two alcohol groups along the hexameric carbon chain of each chemical. The spacing of OH groups in 1,6-HD better matches the distance separating contiguous carbonyl oxygens along the polypeptide backbone than 2,5-HD (27, 39). As such, 1,6-HD has been hypothesized to be better suited, relative to 2,5-HD, for forming two concomitant hydrogen bonds to adjacent carbonyl oxygen atoms along the polypeptide backbone.
To test the effects of aliphatic alcohols on nuclear speckles formed by TDP-43, cultured HCT116 cells expressing the full-length TDP-43 protein fused at its N terminus to GFP were exposed to 8% levels of either 1,6-HD or 2,5-HD for a period of five minutes (Materials and Methods). Despite an apparent increase in nuclear fluorescence caused by exposure to 1,6-HD, quantitative measurements of total nuclear fluorescence revealed no difference between cells exposed to vehicle alone or either of the two aliphatic alcohols (SI Appendix, Fig. S4). By contrast, as shown in Fig. 7, the number of nuclear speckles containing the GFP:TDP-43 fusion protein was reduced by roughly 75% following five minutes of exposure to an 8% concentration of 1,6-HD. Exposure of the same cells to an equivalent concentration of 2,5-HD reduced speckle number by only 30% relative to vehicle-treated cells. These same differences in the melting of TDP-43 nuclear speckles by the same two aliphatic alcohols have been reported by Fawzi, Shorter, and colleagues (38). As such, two independent studies offer concordant observations pertinent to the chemical underpinnings of TDP-43 self-association within the nuclei of cultured mammalian cells.
Effects of two regioisomeric alcohols on nuclear speckles composed of TDP-43. (A) Confocal microscopic images of HCT116 cells expressing a lentivirus-encoded GFP:TDP-43 fusion protein. Cells were grown on cover slips and exposed for 5 min to vehicle alone (Left), an 8% solution of 1,6-hexanediol (Middle), or an 8% solution of 2,5-hexanediol (Right). (Scale bar, 2 μm.) (B) Focal plane images of 30 individual nuclei were analyzed for each culture condition to quantify the number of GFP-positive nuclear speckles. Mean ± SD; one-way ANOVA. Appearance of GFP in the cytoplasm of cells exposed to 1,6-hexanediol (middle image of panel A) may reflect the known ability of 1,6-hexanediol to melt the permeability channel of nuclear pores.
Discussion
Here we describe a simple method for mapping the region of a protein domain of low sequence complexity required for its self-association, phase separation, and partitioning into nuclear puncta. The method we describe should facilitate analysis of any LCD that relies upon homotypic self-association to perform its biological function.
We begin our discussion by asking why several short fragments of the TDP-43 LCD interfere with GFP-fluorescence? The slow-to-fold GFP protein is structurally organized as a barrel-like ensemble of 11 β sheets (40, 41). The final β strand required to complete folding of the protein is located at the very C-terminus of GFP. Insertion of this final β strand represents a slow, cotranslational event that is rate-limiting to GFP folding (42). We speculate that C-terminal juxtaposition of an additional, β strand-prone segment somehow impedes the completion of GFP folding. Owing to the high level of bacterial expression of the GFP:TDP-43 fusion proteins studied herein, we recognize that observed impediments to GFP fluorescence may involve self-association of certain test protein fragments. This interpretation may explain why proline substitutions that interfere with the formation of cross-β structural assemblies, as shown in Figs. 4 and 5, also overcome the suppression of GFP fluorescence as shown in Fig. 3.
Turning from methodology to investigational focus, our findings are interpretable according to a simple concept of TDP-43 LCD self-association. As reported previously, the TDP-43 LCD self-associates via the formation of a labile, cross-β structure localized to a small region of high evolutionary conservation (23, 27, 30). The GFP suppression screens described in Figs. 1 and 2 map inhibition to the same region of the TDP-43 LCD known to mediate self-association and phase separation. Moreover, the observed GFP inhibitory activity was overcome by proline substitutions localized to the cross-β-forming region of the LCD. Concordance was observed between proline variants that overcame GFP inhibition (Fig. 3) with those that inhibited condensation of liquid-like droplets (Fig. 4), inhibited assembly of cross-β polymers (Fig. 5), and attenuated formation of TDP-43 positive nuclear speckles (Fig. 6).
This structure-based interpretation of LCD self-association contrasts with a distinctly different concept having emerged from computer-assisted simulations of LCD phase separation (43, 44). Sophisticated methods of Monte Carlo simulation of unstructured polypeptides have led to a concept of polymer interaction not unlike theoretical concepts having emerged from the plastics industry. These computer simulations have generated a “stickers–and–spacers” concept that is incongruous in three ways with the labile cross-β model for LCD self-association, phase separation, and biological function: i) it posits that LCD self-association is achieved in the absence of molecular structure; ii) it assigns no role whatsoever to the polypeptide backbone in the annealing of polypeptide strands; and iii) it assumes that the interactions required for self-association cannot be sublocalized by either experimentation or computation. We are currently unable to reconcile the data reported herein with computer-simulated interpretations of LCD self-association.
Why is it useful to know where the small, cross-β-forming regions mediating self-association and phase separation map within larger LCDs? First, this understanding helps explain how single residue, disease-causing mutations interfere with LCD function. As an example, by localizing the cross-β-forming region that allows the head domain of the neurofilament light (NFL) protein to weakly self-associate in a manner abetting filament assembly, we can understand why virtually any mutational variation of either proline residue 8 or proline residue 22 causes Charcot Marie-Tooth disease (23). These proline residues insulate either side of the localized, cross-β-forming region of the NFL head domain. Mutational change of either of these proline residues to any of the other 19 amino acids introduces an additional peptide NH group to the polypeptide backbone. As such, alteration of proline residues proximal to cross-β-forming regions can be understood to enhance the avidity of self-association by addition of an interstrand hydrogen bond (23). As a consequence, mutational change of proline residues directly causes an evolutionarily balanced form of protein:protein interaction to be forced out of tune.
A second example of the value of knowing where cross-β interactions map within LCDs also derives from studies of the disordered head domains of intermediate filament (IF) proteins. IF head domains are phosphorylated during mitosis, thus triggering filament disassembly and allowing cultured cells to round up during mitosis (45, 46). The sites of protein kinase A-mediated phosphorylation of the desmin IF protein map to the cross-β-forming region of its head domain. Moreover, phosphorylation of the desmin head domain reverses self-association and phase separation by weakening labile, cross-β interactions (22). As such, the biological phenomenon of IF disassembly and consequent change in cell shape during mitosis can now be understood at a mechanistic level.
A third justification favoring the reductionist mapping of cross-β-forming regions within LCDs relates to the aspired evolution of this science from test tubes to computers. Development of AlphaFold capabilities of protein structure prediction required a foundation consisting of thousands of structures deduced by X-ray crystallography, NMR spectroscopy, and cryoelectron microscopy (47). The methods described herein will simplify the mapping of cross-β-forming regions within larger LCDs, thereby accelerating the conversion of this science from lab bench to laptop.
We close with a thread of logic connecting chemistry to biology. Thousands of studies published over the past decade have reported evidence confirming that LCD self-association is of widespread importance to the dynamics of cell morphology (8, 48, 49). A cursory evaluation of the Google Scholar database tagged by the key words of phase separation, 1,6-hexanediol and LCD reveals more than 400 of these studies as having demonstrated the melting of nuclear or cytoplasmic structures in response to 1,6-hexanediol (SI Appendix, Fig. S5). We propose that if an intracellular structure is dissolved more readily by 1,6-hexanediol than 2,5-hexanediol, as demonstrated herein for nuclear speckles composed of TDP-43 (Fig. 7), the structure of interest will be reliant upon labile cross-β interactions. This logic, coupled with the experimental methods of LCD dissection described herein, offers a conceptual pathway for demystifying many forms of dynamic cell organization.
Materials and Methods
Plasmid Construction.
DNA sequences encoding the TDP-43 LCD (residue 262 to 414) were subcloned into a pHis-parallel vector with an N-terminal His tag. DNA fragments encoding 30 amino acid residues (fragments 1 through 12) from the TDP-43 LCD were subcloned into pHis-parallel-GFP-GDEVD with BamHI and XhoI restriction endonuclease sites. Proline variants of His-TDP-43 LCD, GFP-TDP-43 fragment 6 (F6), and GFP-TDP-43 fragment 7 (F7) were generated by site-directed mutagenesis.
Protein Expression and Purification.
His-TDP-43 LCD WT and mutated variants were expressed in Escherichia coli BL21(DE3) cells at 37 °C, induced with 1 mM IPTG for 3 h. The harvested cell pellets were lysed in a buffer containing 50 mM Tris (pH 7.5), 6 M guanidine-HCl, and 20 mM imidazole. Lysis was performed via sonication (3 min total, alternating 10 s on and 30 s off). Cell debris was removed by centrifugation at 25,000 RPM for 30 min. The clarified supernatant was loaded onto a gravity-fed column packed with Ni-NTA resin (Gold Bio). The resin was washed with the lysis buffer to remove nonspecifically bound proteins. Target proteins were eluted using a buffer containing 50 mM Tris (pH 7.5), 6 M guanidine-HCl, and 300 mM imidazole. Eluted proteins were concentrated to 1 mM and subjected to buffer exchange prior to use. Buffer exchange was performed by dialysis against a storage buffer containing 10 mM Tris (pH 7.5), 6 M guanidine-HCl, and 1 mM EDTA using Slide-A-Lyzer MINI dialysis units (Thermo Scientific, 69552) at 4 °C. Protein concentrations were determined using both a Nanodrop spectrophotometer and a BCA assay. For droplet formation assays and Thioflavin T (ThT) fluorescence kinetic assays, protein concentrations were adjusted to 600 μM using storage buffer.
Droplet Formation and OD600nm Measurements.
His-TDP-43 LCD WT and its variants (600 μM in 6 M guanidine-HCl) were diluted 30-fold into droplet formation buffer as indicated using a multichannel pipette. A 40 μL aliquot of the resulting solution was transferred into a 384-well plate (Greiner Bio-One, 781091) using the same pipette. After a 5-min incubation, absorbance at 600 nm was measured using a microplate reader. For droplet formation imaging, samples were incubated for 1 h, and images were acquired using the Bio-Rad ZOE Fluorescent Cell Imager.
Thioflavin T (ThT) Fluorescence Kinetic Assays.
His-TDP-43 LCD WT and its variants (600 μM in 6 M guanidine-HCl) were diluted into ThT buffer containing 50 mM MES (pH 6.8), 150 mM NaCl, 1 mM EDTA, 40 μM ThT, and 0.05% NaN_3_. The final protein concentration was adjusted to 20 μM. All ThT samples were dispensed into a 384-well microplate (Greiner Bio-One, 781091) with a volume of 40 μL per well. The plate was sealed with a foil film to prevent evaporation. ThT fluorescence was monitored at room temperature with an excitation wavelength of 450 nm and an emission wavelength of 485 nm. The master plate was shaken for 10 s before each reading of fluorescence to ensure sample uniformity.
GFP Suppression Assays.
Plasmids encoding N-terminal GFP-tagged TDP-43 fragments and mutated variants thereof were transformed into BL21(DE3) competent cells via heat shock following standard protocols. Transformed cells were incubated at 37 °C in a shaker for 40 min. Cells were subsequently inoculated into 96 DeepWell plates (Fisher, 12566612) containing 0.5 mL of LB culture medium supplemented with 100 μg/mL ampicillin and 40 μM IPTG. Plates were sealed with Constar 6570 sealing tape and incubated overnight at 37 °C with shaking. Following incubation, the cells were harvested by centrifugation, washed once with PBS, and resuspended in 1 mL of PBS. GFP fluorescence intensity expressed in living E. coli cells was measured alongside turbidity, as both values exhibit a linear correlation under appropriate cell densities. E. coli suspensions were serially diluted followed by measurements of both turbidity (absorbance at 600 nm) and GFP fluorescence intensity (excitation/emission = 488 nm/513 nm) using transparent 96-well plates (NEST, 701011, 100 μL per well) and a microplate reader. The resulting data were fitted to linear regression curves using GraphPad. The slopes of the WT and double proline variant curves were recorded and normalized to the WT value for comparison.
Cell Culture.
HCT116 cells were a gift from Dr. Deepak Nijhawan. HCT116 cells were cultured in the Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum, 2 mM L-glutamine. All cells were incubated at 37 °C in a humidified atmosphere of 95% air and 5% CO_2_.
Lentivirus Production and Generation of Inducible Cell Lines.
Full-length GFP-TDP-43 WT and proline mutants thereof were cloned into a doxycycline-inducible pSPCTRE-Hygro vector using the AscI and BamHI restriction sites. LentiX293T cells were cotransfected with psPAX2, pMD2.G, and pSPCTRE-Hygro-GFP-TDP-43 plasmids. After 24 h, the culture medium was replaced. At 72 h posttransfection, the medium was collected and filtered through a 0.45 μm syringe filter (Millipore, SLHV033RB). The resulting lentiviruses were used to infect HCT116 cells, which were subsequently selected using 200 μg/mL hygromycin.
Protein Extraction and Western Blotting.
Inducible cell lines expressing GFP-TDP-43 full-length WT and double proline variants thereof were cultured in 6-well plates. After 24 h of incubation, doxycycline was added to the medium at a final concentration of 3 ng/mL. Following 48 h of induction, the cells were washed twice with PBS and lysed in RIPA buffer (30 mM Tris, pH 7.4, 150 mM NaCl, 1% IGEPAL CA-630, 1% sodium deoxycholate, 0.1% SDS) supplemented with a protease inhibitor cocktail (Roche) and Benzonase Nuclease (Novagen). Lysis was performed on ice for 30 min. Cell debris was removed by centrifugation at 21,000×g for 30 min at 4 °C. Protein concentration of the lysates was determined using a BCA Protein Assay Kit (Thermo Fisher Scientific). For each cell line, 10 μg of total protein was resolved by SDS-PAGE. Proteins were transferred to a membrane and immunoblotted with primary antibodies, including rabbit anti-N-terminal TDP-43 (Proteintech, 10782-2-AP) and mouse anti-GAPDH (EMD Millipore, C87727). Secondary antibodies used were HRP-conjugated goat anti-rabbit and HRP-conjugated goat anti-mouse (Bio-Rad). Chemiluminescence detection was performed using the ECL Western Blotting Substrate (Bio-Rad), and images were captured with a ChemiDoc Touch Imaging System (Bio-Rad).
Immunocytochemistry and Confocal Imaging.
Inducible cell lines expressing GFP-TDP-43 full-length WT and double proline mutants thereof were cultured in 4-well chambers (BD Falcon, 354104) and incubated with 3 ng/mL doxycycline for 48 h. After removing the culture medium, the cells were washed twice with warm PBS and fixed with 4% PFA for 15 min. Following fixation, cells were washed twice with PBS and mounted using Vectashield mounting medium supplemented with DAPI dye (Vector, H1200). Confocal images were acquired using a Zeiss LSM880 microscope equipped with a 63× oil immersion objective, capturing 0.5 μm Z-stacks at 3.5× zoom. Image dimensions were set to 1,024 × 1,024 pixels. For hexanediol-treated cells, GFP-TDP-43 WT-expressing cells were cultured in poly-D-lysine-coated chambers (Gibco, A3890401), incubated with 3 ng/mL doxycycline for 48 h, and subsequently treated with 8% 1,6-hexanediol (Sigma-Aldrich, 240117) or 8% 2,5-hexanediol (Sigma-Aldrich, H11904) for 5 min before washing and fixation. Confocal images were acquired using parameters described above.
Analysis of TDP-43 nuclear speckles was conducted using Fiji software. Confocal images of GFP-positive nuclei were processed with maximum intensity Z-projection and smoothed. Single GFP-positive nuclei were extracted, and their average fluorescence intensity was calculated by using Image → Adjust → Threshold → Method: Li → Auto → Analyze → Analyze Particles modules. Speckles were identified as structures with fluorescence intensity exceeding 1.6 times the average intensity. Adjacent speckles were separated using Process → Binary → Watershed, and speckle count was automatically determined with Analyze → Analyze Particles.
Quantification and Statistical Analysis.
Statistical parameters including the definitions and values of n, distributions, and deviations are reported in Figures and Figure Legends. Measurements of statistical significance in this were determined by One-way ANOVA. Statistical analysis was performed in GraphPad.
Supplementary Material
Appendix 01 (PDF)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1C. B. Anfinsen, Principles that govern the folding of protein chains. Science 181, 223–230 (1973).4124164 10.1126/science.181.4096.223 · doi ↗ · pubmed ↗
- 2J. Ma, M. Ptashne, A new class of yeast transcriptional activators. Cell 51, 113–119 (1987).3115591 10.1016/0092-8674(87)90015-8 · doi ↗ · pubmed ↗
- 3S. J. Triezenberg, R. C. Kingsbury, S. L. Mc Knight, Functional dissection of VP 16, the trans-activator of herpes simplex virus immediate early gene expression. Genes Dev. 2, 718–729 (1988).2843425 10.1101/gad.2.6.718 · doi ↗ · pubmed ↗
- 4J. C. Wootton, Non-globular domains in protein sequences: Automated segmentation using complexity measures. Comput. Chem. 18, 269–285 (1994).7952898 10.1016/0097-8485(94)85023-2 · doi ↗ · pubmed ↗
- 5M. E. Oates , D(2)P(2): Database of disordered protein predictions. Nucleic Acids Res. 41, D 508–516 (2013).23203878 10.1093/nar/gks 1226 PMC 3531159 · doi ↗ · pubmed ↗
- 6R. van der Lee , Classification of intrinsically disordered regions and proteins. Chem. Rev. 114, 6589–6631 (2014).24773235 10.1021/cr 400525 m PMC 4095912 · doi ↗ · pubmed ↗
- 7K. You , Pha Sep DB: A database of liquid-liquid phase separation related proteins. Nucleic Acids Res. 48, D 354–D 359 (2020).31584089 10.1093/nar/gkz 847PMC 6943039 · doi ↗ · pubmed ↗
- 8M. J. do Amaral, Y. Cordeiro, Intrinsic disorder and phase transitions: Pieces in the puzzling role of the prion protein in health and disease. Prog Mol. Biol. Transl. Sci. 183, 1–43 (2021).34656326 10.1016/bs.pmbts.2021.06.001 · doi ↗ · pubmed ↗
