Exploring Disordered Regions of Human Spliceosome Proteins
Bruno de Paula Oliveira Santos, Krishnendu Bera, Luca Grisanti, Isabella Caterina Felli, Roberta Pierattelli, Alessandra Magistrato

TL;DR
This paper explores disordered regions in human spliceosome proteins, showing they are common, evolutionarily conserved, and linked to cancer.
Contribution
The study provides a comprehensive analysis of disordered regions in the human spliceosome and their roles in regulation and disease.
Findings
Many spliceosome proteins contain over 40% disordered residues.
Disorder is driven by compositional bias and is evolutionarily conserved.
IDRs are hotspots for cancer mutations and post-translational modifications like phosphorylation.
Abstract
Introns are removed from mRNAs by the spliceosome, a type of protein–RNA machinery enriched with intrinsically disordered regions (IDRs). Lacking stable 3D structures, IDRs can adopt diverse conformations interlacing protein and RNA components of the spliceosome and regulating splicing. In this work, we performed a comprehensive bioinformatics analysis of the human spliceosome proteome, revealing that many proteins contain more than 40% disordered residues. Spliceosome IDRs are mainly driven by compositional bias due to an excess of charged and RS-like sequences, with the nature and extent of this disorder being broadly conserved evolutionarily. Additionally, these IDRs are frequent targets of post-translational modifications, especially phosphorylation, and are hot spots for cancer-associated mutations, which have been implicated in different types of cancer. Our results collectively…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA Research and Splicing · RNA and protein synthesis mechanisms · RNA modifications and cancer
Intrinsically disordered proteins (IDPs) or proteins with intrinsically disordered regions (IDRs) lack regular three-dimensional structure and are characterized by a large set of dynamic and interconverting conformations. IDRs are widespread in the proteome and are present across all domains of life.? They play critical roles in cellular functions such as transcriptional regulation, signal transduction, and subcellular organization. The plasticity and structural heterogeneity of IDRs indeed expand their repertoire of macromolecular interactions and allow them to be finely modulated by their structural and chemical environment. ?−? ? IDRs are also abundant in ribonucleoprotein complexes involved in gene expression, regulation, and synthesis.? Among them, the spliceosome machinery, which promotes premature mRNA (pre-mRNA) splicing via dynamic protein/RNA binding and dissociation events, contains a large fraction of IDRs in its components.?
In eukaryotic cells, the spliceosome removes the noncoding regions (introns) from a pre-mRNA transcript and connects the coding sequences (exons).? The most prevalent spliceosome form, the major spliceosome, consists of five small nuclear RNA (snRNA) strands (U1, U2, and U4–U6) and 100–200 associated proteins, which assemble into small nuclear ribonucleoprotein particles (snRNPs). ?,? The ordered and regulated assembly of these snRNPs and auxiliary proteins forms the spliceosome complex that undergoes extensive rearrangements during the splicing process. The resulting major spliceosome, which splices most introns (U2-dependent introns), uses the U1 and U2 snRNPs to scan pre-mRNA and identify specific intron sequences (splice sites). Namely, U1 and U2 snRNPs bind sequentially to the pre-mRNA, initially forming the E complex and later A complexes, where the initial recognition and assembly steps occur. In the subsequent steps of the cycle, the spliceosome remodels to perform catalysis, assembling into the B, B^act^, and B* complexes (the latter being where the branching reaction occurs).? This is followed by the formation of the C and C* complexes (where the exon-ligation step occurs),? the P complex, where the products are formed, and, finally, the intron lariat spliceosome (ILS) complex, where the intron and the exon are released? and the spliceosome is dismantled to undergo a new splicing cycle. ?,?−? ? Besides those contained in snRNPs, additional proteins are involved in splicing. They can function individually or assemble into multiprotein auxiliary complexes, such as the nineteen complex (NTC) complex,? the exon-junction complex (EJC),? the cap-binding complex (CBC),? the retention-and-splicing complex (RES),? and the transcription–export complex (TREX).?
The second form of the spliceosome, the minor complex, splices a small class (<1%) of introns (U12-dependent introns), which have stronger splice site consensus sequences compared to U2-dependent introns. This spliceosome variant consists of four unique snRNAs (U11, U12, U4atac, and U6atac) and 14 unique protein factors.? They work together with the core protein components shared by both spliceosomes likely through conserved mechanisms.
Irrespective of the spliceosome type, mutual recognition of spliceosome subunits, correct assembly, and component remodeling are paramount for splicing fidelity. In this context, IDRs play a key role in spliceosome function by forming weak interactions that interlace with many protein/RNA components and modulate binding dynamics. Moreover, IDRs can be finely regulated by post-translational modifications (PTMs), which readily alter their set of interactions.
Spliceosome IDRs can be divided into four categories: (i) regions containing predicted secondary structure (SS) elements (termed SS-IDR), (ii) long (≥25 residues) compositionally biased IDRs (termed CB-IDRs), which includes RS-like (IDRs rich in Arg and Ser), poly-P/Q (IDRs with repeats of proline or glutamine), and G-rich regions (IDRs with Gly repeats, RGG, [RSY]GG, and R[AGT][AGTFIVR]), with charged disorder and noncharged disorder,? (iii) the IDR object of PTMs (PTM-IDR), and (iv) of cancer-associated mutations (CA-IDR).
To explore their abundance, types, and functional and regulatory roles, we performed an integrative sequence-based analysis of spliceosome proteins, focusing on their IDR content, complemented by statistical correlation analysis of PTM and cancer-associated sites, and phylogenetic inferences. Overall, this study deepens our understanding of IDRs in splicing, highlighting their implications in splicing regulation and disease.
To characterize the prevalence of intrinsic disorder in spliceosome proteins, we performed a bioinformatic analysis across different splicing steps, protein classes, and sequence contexts (Figures–?; see sections 1.1–1.4 of the Supporting Information for details). To this end, we retrieved the spliceosome sequences from the UniProt database (www.uniprot.org), predicted their disorder content, and calculated the disorder percentage in different groups.
We first evaluated the degree of disorder in the spliceosome complexes. We observed that in the major spliceosome the initial E complex exhibits the highest disorder content. The percentage decreases in the A complex until the pre-B complex is reached, slightly increasing in the B^act^ and B* complexes, where the first splicing reaction occurs, and remaining roughly constant until the P complex forms. The disorder content then decreases drastically at the ILS complex (Figurea). These data imply that in the central and final steps of the splicing cycles, where the spliceosome complex is fully assembled and engaged in catalysis and in dismantling its components, the plasticity of the IDRs is less crucial.
Although data on the minor spliceosome are more limited, it overall appears to contain a lower proportion of disordered residues, with the highest content being in the minor spliceosome A (AT-AC) complex (31%). Conversely, the disorder content of the remaining minor spliceosome specific complexes remains constant (27%) and increases during 3′-splice site cleavage (45.3%) (Figureb).
We next analyzed the disorder content by protein classes or families. We observed that only three families have less than 20% disorder content (i.e., “like Sm” (LSM, 10.8%), Gemin proteins (GEM, 15.2%), and TREX (19.4%)). Conversely, many families exhibited disorder content of at least 40% (i.e., proteins recruited at B^act^ complex (40.0%), nonspecified, i.e., no classes or families labeled/annotated (40.8%), U2 snRNP-associated (present in the A, B, B^act^, and B* complexes, 41.5%), recruited at B complex (44.8%), SR protein (47.6%), U1 snRNP (53.3%), both present in the E, A, and B complexes, and RES complex (present in the A, B, and B^act^ complexes, 70.8%)) (Figurea). These results confirm that the early spliceosome complexes contain proteins with the largest IDRs. However, the proportion of disordered and ordered sites is different in every complex (Figurea).
We also annotated the disorder types, classifying the IDRs as CB-IDR and SS-IDR. CB-IDRs are characterized by the presence of residues or motifs with a higher frequency than normally expected in the proteins of vertebrates (Figureb). This analysis revealed that the CB-IDR content varied between 0% and 75%, with the SR and SR-related proteins exhibiting the highest content.
The most common types of CB-IDR were associated with the presence of charged residues (18.57%), followed by RS-like motifs (12.82%) and poly-P/Q sequences (6.0%). However, the relative abundances of these CB-IDR segments were different in distinct protein classes (Table S1). Besides the RES, SR, and SR-like proteins, in which the RS-like content was the largest, in the other groups the CB-IDRs were owed to the presence of charged residues.
Conversely, the SS-IDR content, that are regions predicted to be disordered and to contain secondary structure content (i.e., α-helices or β-sheets) varied from 2% to 43%, with LSM (43.7%) being the class with the largest SS-IDR content, followed by the Prp19 complex (23.4%), proteins recruited at C complex (20.6%), and U5 snRNP proteins (19.5%). Within the SS-IDR, the α-helical type of secondary structure was predominant (Figureb and Figures S1 and S2).
Interestingly, the disorder content in the LSM, Prp19 complex, and U5 snRNP classes of proteins is low. The existence of these SS-IDRs in these regions suggests that they are transitional. In a few cases, IDRs were characterized by a proportional amount of CB-IDR and SS-IDR content (U4/U6.U5 snRNP, Step II factor, recruited at C complex, and Prp19 complex), suggesting that these classes may exhibit promiscuous behavior during splicing.
Next, we assessed the presence and type of PTMs occurring within the IDRs of spliceosome proteins (section 1.6 of the Supporting Information). First, we calculated the relative abundance of PTMs in ordered or disordered regions of the spliceosome proteins. An enrichment test, comparing the proportion of residues with PTMs in disordered regions versus ordered regions, showed that PTMs in IDRs are more abundant (Fisher test p value = 1.101 × 10^–115^). Namely, we identified 1012 sites hosting PTMs over 32 068 sites without PTMs in IDRs as compared to 831 sites hosting PTMs over 78 059 sites without PTMs in ordered regions.
Then, we calculated the density of PTMs per protein, revealing that PTMs per residue are significantly more abundant in IDRs (Wilcoxon test p value = 5.782 × 10^–15^). Additionally, we evaluated the PTM distribution per class/family (Figurea), confirming the prevalence of PTMs in IDRs (>50%). Finally, we inspected the relative abundance of each PTM type, defined as the proportion of PTMs located in IDRs relative to the total number of PTMs across the entire protein sequence (including both ordered and disordered regions). Interestingly, citrulline (0.66), phosphoserine (0.64), N-acetylalanine (0.6), and N-acetylserine (0.54) were more abundant in IDRs than in ordered regions (ratio of >0.5) (Figureb). Notably, these PTMs are commonly associated with diverse biological functions ranging from rapid signaling (phosphoserine) to epigenetic regulation (citrulline) and fundamental protein life-cycle management (N-acetylation), ?−? ? ? ? suggesting that they play similar regulatory roles in splicing.
Focusing on only IDRs, we then inspected the abundance of PTM types in different protein classes/families (Table S2), with phosphoserine (65,7%) and phosphothreonine (11.2%) being the predominant ones (the PTM percentage was calculated as the amount of a specific PTM type divided by the sum of the overall number of all PTM types) (Figurec). These PTMs may be involved in modulating protein–protein interactions and subcellular compartment regulation. Other PTM types, such as N6-acetyllysine (6.1%), ω-N-methylarginine (4.4%), N-acetylalanine (3.9%), and asymmetric dimethylarginine (2.4%), instead, showed class/family specific patterns.
Due to the centrality of the spliceosome for gene expression and regulation and its implication in cancer, ?−? ? we further investigated the frequency of cancer-causing mutations in IDRs (section 1.6 of the Supporting Information). This analysis revealed that among spliceosome protein families, the SR and hnRNP proteins are most frequently objects of mutations (Figurea). Overall, 43 proteins contained IDRs that host cancer-associated variants. Among them are FUS, RMB10, SF3B1, SRSF2, U2AF1, and THOC2 (Table S3). The number of mutations in the IDRs of these proteins varied from 1 to 5 and was associated with different types of tumors, with carcinomas being the most abundant category (Figureb). Remarkably, their implication in diverse cancer types reflects the broad impact of splicing alterations in tumorigenesis. Complete data on mutations in ordered and disordered regions are provided in Tables S3 and S4.
To check for an association between PTMs and cancer-associated variants, we even examined whether cancer-causing mutations, flanking (i.e., located within five residues?) the PTM site, occurred frequently.
This enrichment analysis identified 17 proteins hosting cancer-causing mutations near the PTM sites. In nine of these proteins (FUS, THOC2, SRSF2, WBP11, U2AF2, SF3B1, RBM10, DDX41, and RBM8A), these cancer-associated variants are related to skin abnormality (human phenotype ontology identifier HP:0000951), while in five of them (SRSF2, SF3B1, PRCC, DDX41, and RBM8A), they are related to neoplasms (HP:0002664). In both cases, the mutations are predominantly associated with phosphoserines and phosphothreonines (Table S5).
Next, we ranked the proteins by their disorder fraction and considered their cellular abundance. Among the proteins containing the largest IDRs, only a subset was highly abundant (Table). These proteins (SRRM1, FUS, YBOX1, and RU17) have disorder contents ranging from 50% to 90% and are predominantly characterized by a CB type of disorder, even if the type of residues causing the CB varies (Results 1 of the Supporting Information). The observed differences in the type of sequences causing the CB type of disorder may allow their multivalent interactions to be critical for the formation and remodeling of dynamic ribonucleoprotein complexes. Their high expression levels and disorder content suggest that these proteins may play critical roles in cellular homeostasis, as detailed in Results 1 of the Supporting Information. The regulatory role of FUS, SRRM1, and YBX1 is further supported by their hosting of multiple PTMs (Table S2).
Additionally, we investigated the evolutionary dynamics of spliceosome proteins exhibiting the highest IDR content (>70%) (section 1.5 and Results 2 of the Supporting Information and Figures S4–S25). Specifically, we inspected whether structural disorder played a role in the evolutionary history of these proteins and if their orthologs retained similar disorder content. To this end, we constructed phylogenies for 19 proteins. Multiple-sequence alignments revealed that, in most cases, the high disorder content was maintained across orthologs (Results 2 of the Supporting Information). In a few cases, specific clades deviated markedly, displaying loss of disorder, thus reflecting different evolutionary trajectories (Results 2 of the Supporting Information).
Commonly, disordered sites of proteins display rapid evolutionary dynamics, while ordered sites tend to be more conserved. This metric is evaluated through disorder-to-order transitions (DOTs), which refer to the change between ordered and disordered states. Interestingly, when analyzing this property in spliceosome IDRs, we observed that SS-IDRs, located adjacent to ordered regions in the same protein or at positions corresponding to ordered regions in ortholog proteins, exhibited significantly lower DOT rates compared to those of the other disordered regions (Mann–Whitney U = 2.39 × 10^6^; p < 10^–9^). Due to their lower DOT values and preservation of amino acid sequence, SS-IDRs appear to be more evolutionarily conserved than the other random disordered sites, suggesting that their secondary structure imposes evolutionary constraints across orthologs.
Notably, upon analysis of the increase or decrease in the disorder content across evolution (net gain of disorder), clades exhibited mixed trends, indicating protein specialization and different functions for the IDR content in different organisms (Figures S24 and S25). Collectively, this analysis revealed that the predominant evolutionary trend in IDRs was based on the persistence of SS-IDRs. These regions are thus likely conserved to play a functional role in splicing regulation.
We have finally estimated the liquid–liquid phase separation (LLPS) propensity of spliceosomal proteins using catGRANULE 2.0, a machine learning-based predictor.? A large fraction of proteins displayed LLPS propensity scores of more than 0.5 (92%), with 67% being strongly LLPS-prone (>0.8), indicating enrichment in sequence features associated with phase separation (Table S6).
Previous computational studies indicated that intrinsic disorder is a defining feature of spliceosomal proteins. An initial study focused on serine/arginine-rich (SR) splicing factors, showing that SR proteins are enriched with disorder-promoting residues and lack stable folded structures.? Subsequent bioinformatics analysis of the human and Saccharomyces cerevisiae spliceosomal proteomes revealed a marked enrichment of IDP/IDRs relative to their respective background proteomes.? The spliceosome had an ordered catalytic core supported by peripheral, evolutionarily younger, IDR-rich proteins involved in early assembly, regulation, and dynamic remodeling.? Computational identification of disorder-based binding sites with tools such as α-MoRFs and ANCHOR? further supported the idea that intrinsic disorder facilitates spliceosome assembly, reversibility, and adaptability. As such, these studies established intrinsic disorder as a conserved, functionally essential property of the spliceosome and highlighted the central role of computational approaches in uncovering its structural and evolutionary principles.
Here we build and expand on previous findings predicting that half of the spliceosome proteins contain extensive IDRs (48.5%), which are unevenly distributed in amount and type. Notably, a large proportion (35.7%) of these proteins have >40% of their sequence classified as disordered. The disorder content is strongly correlated with RS-like and charged compositional bias, and approximately 15% of disordered regions alternate with secondary structure formation. The highest percentage of disorder is observed at the E complex, confirming that IDRs are key for early spliceosome assembly.
Notably, spliceosome IDRs are targets for PTMs,? with phosphorylation of serine and threonine residues being the most abundant type. The addition of a bulky and negatively charged phosphoryl group is expected to regulate splicing by modulating interaction affinity or complex assembly, particularly in SR proteins and other splicing regulators.? This regulatory principle is exemplified in several core spliceosome components. As an example, phosphorylation of SAP155 (U2 snRNP) is tightly coordinated with catalytic steps, ?,? phosphorylation of PRP28 by SRPK2 is essential for stable tri-snRNP integration, ?,? and multisite phosphorylation of SF3B1 N-terminal by CDK11 modulates RNA interaction within the B^act^ complex.?
Interestingly, only a few among the most disordered proteins (e.g., SRRM1, FUS, YBOX-1, and SNRNP70) are abundant in the human proteome. Their function may be that of interaction hubs within the spliceosome and in dynamic ribonucleoprotein aggregates. The remaining highly disordered proteins, characterized by lower cellular abundance, must instead be more specifically implicated in splicing regulation.
Across evolution, spliceosome proteins retain the SS-IDR content (Figures S24 and S25), which is conserved in different clades. In contrast to other protein families whose IDRs are rapidly evolving,? spliceosome IDRs display low rates of disorder–order transitions throughout evolution, as reflected by their small number of changes per node (ranging from 0.01 to 0.08) . Notably, more changes per node are concentrated in SS-IDR segments, indicating that the limited DOT is focused on specific residue positions across lineages. This pattern suggests constrained flexibility, which is consistent with the function of spliceosome proteins as an essential type of cellular machinery evolving under purifying selection,? and may be associated with the presence of phosphorylation sites.?
Interestingly, we identified multiple cancer-associated mutations in IDRs that cluster in splicing-related families, such as in SR proteins, in hnRNP, and mostly in proteins recruited at A, B, B^act^, and C complexes. Since spliceosome IDRs flexibly interlace proteins and RNA, even subtle perturbation of these interaction networks may alter their function, ultimately triggering splicing defects. The tumor types associated with these variants were diverse, confirming the systemic impact of splicing dysregulation in cancer.
Lastly, we predicted that splicesome proteins have high LLPS propensity, being strongly enriched with IDRs and containing RNA-binding domains.? In a manner consistent with our prediction, several spliceosomal regulators and associated proteins were recurrently revealed to undergo liquid–liquid phase separation or to participate in phase-separated nuclear condensates. Among these, RBFOX1 and AKAP95 form dynamic assemblies mediated by low complexity or IDRs that contribute to splicing regulation.? hnRNPs and serine/arginine-rich splicing factors, such as SRSF9, were observed to form condensate-like droplets with functional consequences for splice site usage and splicing regulation, while nuclear speckles (membraneless organelles enriched with spliceosomal components) exemplify how LLPS may organize the splicing machinery in vivo.? Several spliceosome-associated proteins and RNA-binding proteins (RBPs), including core structural components of nuclear speckles like SON and SRRM2, contain extensive IDRs supporting multivalent interactions characteristic of phase-separated condensates.? Spliceosome-associated factors, such as PLRG1, were shown to localize to nuclear speckles through LLPS-mediated interactions, facilitated by IDPs.? In general, dynamic liquid-like nuclear condensates are enriched with RBPs and splicing factors and are thought to promote spliceosome assembly, spatial organization, and regulation of pre-mRNA splicing.? Together, these observations support a model in which LLPS contributes to the functional organization and dynamic regulation of the spliceosomal machinery.
Overall, our study underscores the central role of IDRs in splicing regulation and disease, revealing that spliceosome IDRs are abundant, evolutionarily conserved, and functionally important regions that host regulatory and cancer-associated variants. These features align with the highly dynamic, transient protein–protein and protein–RNA interactions driving spliceosome function.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Trivedi R.Nagarajaram H. A.Intrinsically disordered proteins: An overview Int. J. Mol. Sci.202223221405010.3390/ijms 23221405036430530 PMC 9693201 · doi ↗ · pubmed ↗
- 2Holehouse A. S.Kragelund B. B.The molecular basis for cellular function of intrinsically disordered protein regions Nat. Rev. Mol. Cell Biol.202425318721110.1038/s 41580-023-00673-037957331 PMC 11459374 · doi ↗ · pubmed ↗
- 3Hentze M. W.Castello A.Schwarzl T.Preiss T.A brave new world of RNA-binding proteins Nat. Rev. Mol. Cell Biol.201819532734110.1038/nrm.2017.13029339797 · doi ↗ · pubmed ↗
- 4Castello A.Fischer B.Eichelbaum K.Insights into RNA biology from an atlas of mammalian m RNA-binding proteins Cell 201214961393140610.1016/j.cell.2012.04.03122658674 · doi ↗ · pubmed ↗
- 5Korneta I.Bujnicki J. M.Intrinsic disorder in the human spliceosomal proteome P Lo S Comput. Biol.201288 e 100264110.1371/journal.pcbi.100264122912569 PMC 3415423 · doi ↗ · pubmed ↗
- 6PokornáP.AupičJ.Fica S. M.Magistrato A.Decoding spliceosome dynamics through computation and experiment Chem. Rev.2025125209807983310.1021/acs.chemrev.5c 0037441071962 · doi ↗ · pubmed ↗
- 7Wilkinson M. E.Charenton C.Nagai K.RNA splicing by the spliceosome Annu. Rev. Biochem.20208935938810.1146/annurev-biochem-091719-06422531794245 · doi ↗ · pubmed ↗
- 8Wahl M. C.Will C. L.Lührmann R.The spliceosome: Design principles of a dynamic RNP machine Cell 2009136470171810.1016/j.cell.2009.02.00919239890 · doi ↗ · pubmed ↗
