U2AF2 controls alternative splicing in speckle-proximal regions in an RS domain-dependent manner
Serhii Pankivskyi, Asaki Kobayashi, Jean de Matha Salone, Kimberley Gargoly, David Pastré, Alexandre Maucuer

TL;DR
The study reveals how the splicing factor U2AF2 regulates alternative splicing near nuclear speckles, with its RS domain playing a key role in this process.
Contribution
The novel finding is that U2AF2's RS domain mediates its localization and function in splicing, particularly in speckle-proximal regions.
Findings
U2AF2's RS domain is essential for its self-association and localization to nuclear speckles.
Splicing is most affected in genes near speckles when U2AF2 is reduced or its RS domain is removed.
Phosphorylation sites in the RS domain are required for normal splicing regulation.
Abstract
Splicing factor U2AF2 is known to play a pivotal role for 3′ splice site recognition at an early step of spliceosome assembly. Here, using proximity labeling and biochemical confirmations, we extend the repertoire of putative functional partners of U2AF2 mainly for splicing, chromatin modification, transcription, 3′ end processing, and RNA methylation. Removal of the U2AF2 RS domain alters numerous interactions, including self-association, reduces its localization to nuclear speckles, and impacts splicing genome-wide in a manner that depends both on splicing signals and on intron length. Indeed, cassette exon flanked by short introns in genes or transcripts located close to speckles are the most affected by U2AF2 knockdown or RS domain removal. Finally, we show that phosphorylation sites within the U2AF2 RS domain are required for normal splicing, suggesting that its RS domain mediates…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11- —Association Nationale de la Recherche et de la Technologie10.13039/501100003032
- —Université d'Évry Val-d'Essonne10.13039/100020987
- —Institut National de la Santé et de la Recherche Médicale10.13039/501100001677
- —Genopole10.13039/501100007149
- —INSERM10.13039/501100001677
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA Research and Splicing · Genomics and Chromatin Dynamics · RNA and protein synthesis mechanisms
Introduction
RNA splicing is an essential step of gene expression in eukaryotes. The highly precise definition of splice sites and the correct joining of exons ends are critical to generate mature messenger RNAs that can be translated into functional proteins. This splicing chemistry is performed by the spliceosome, a multimegadalton ribonucleoprotein complex that assembles de novo and stepwise on introns for each splicing event [1–5].
Splicing factor U2AF (U2 snRNP Auxiliary Factor) is among the first players in the assembly of the spliceosome. It is an heteroduplex comprising a large 65 kDa subunit, U2AF2 (alias U2AF^65^), and a small 35 kDa subunit, U2AF1 (alias U2AF^35^) [6]. U2AF2 interacts with a pyrimidine tract located between the branch point and the 3′ splice site [6]. U2AF1 binds the AG dinucleotide just preceding the 3′ splice site [7–9]. Additionally, U2AF2 contacts splicing factor SF1 that settles on the branch point sequence comprising the branch point adenosine of the future lariat intron [10–13]. The synergic interactions between SF1, U2AF2, and U2AF1 result in the formation of an early spliceosomal complex (E complex) [14]. Next, in an ATP-dependent manner, SF1 is replaced by U2 snRNP at the branch point site. This targeting of U2 snRNP to the branch site requires U2AF [6, 15]. U2AF2 interacts directly with the U2 snRNP largest subunit SF3B1 through its C-terminal UHM domain [16–19] and its N-terminal arginine-serine-rich (RS) domain is proposed to facilitate annealing of U2 small nuclear RNA (U2 snRNA) to the branch site sequence [20].
These first steps exemplify how spliceosome assembly relies on numerous contacts engaged by its protein and RNA components. Besides specific protein–protein and protein–RNA interactions mediated by folded domains such as RRM, KH, or UHM domains, the role of dynamic multivalent interactions relying on low complexity regions of splicing factors is experimentally less documented. Recent investigations have highlighted self, multivalent, and dynamic interactions of such low complexity domains. These dynamic interactions have the capacity to partition in vitro the corresponding proteins in two phases consisting in droplets of high concentration and a surrounding phase of lower concentration, a phenomenon known as liquid–liquid phase separation (LLPS) [21, 22]. This organization in vitro makes LLPS being considered a driving force to assemble membraneless organelles in cells. More generally, multivalent homotypic and heterotypic interactions among low complexity regions are also thought to contribute to the formation of dynamic higher assemblies of proteins and RNA in a variety of cellular processes, including gene transcription, DNA repair, or signal transduction [23–25]. Interestingly, low complexity domains (LCD) are most significantly enriched in spliceosomal proteins, suggesting that multivalent low affinity interactions are particularly relevant for splicing [26]. Among these LCDs, RS domains are characterized by low complexity sequences with specific local enrichments of arginine and serine dipeptides. They are frequently present in proteins that are functionally related to RNA splicing and concentrated in nuclear speckles which are subnuclear membraneless compartments enriched in RNA and RNA-binding proteins mainly related to splicing [27–30]. RS domains are thought to function in bridging proteins bound to exonic splicing enhancers with spliceosomal components recruited to splice sites thus helping exon definition [31], while contacts of RS domains of SR proteins with both U1 snRNP and splicing factors at the 3′ end of introns suggest that these domains are involved in bridging the 5′ and 3′ splice site for splicing to occur [27]. However, in most cases, the molecular functions of RS domains in messenger RNA (mRNA) processing are not documented [28].
As mentioned above, U2AF2 is an RS domain-containing scaffold protein presenting multiple binding domains (Fig. 1A). A central tandem of RRM domains is responsible for pyrimidine tract recognition [32]. RRM2 is also involved in FUBP1 interaction [33]. A UHM domain at the C-terminus interacts with ULM motifs in SF1, SF3B1, and SUGP1 [12, 14, 16, 34, 35]. A ULM motif with an essential tryptophan residue (W92) is N-terminal to the tandem RRMs and binds tightly to the small subunit, U2AF1 [36]. Finally, the RS domain located at the N-terminus, with a concentration of nine RS or SR dipeptides between aa 27 and 62 in the human protein, can bind the pre-mRNA [20].
*U2AF2 RS domain removal from human cells. (A) Domain structure and prediction of disordered regions in U2AF2. (B) Schematic representation of the U2AF2 mutants with partial (mutants dRS1: aa 24–52) or complete (mutant dRS2: aa 24–65) deletion of RS domain expressed in corresponding CRISPR-Cas9 modified HEK293 cells. (C) Analysis of the expression of U2AF2 deletion mutants in CRISPR-Cas9 modified HEK293 cells. U2AF2 was detected on immunoblots using anti-U2AF2 antibodies (MC3). (D) Sedimentation assay of protein extracts of wild-type (WT) and modified HEK293 cells with the partial or the complete RS domain deletion. The two successive pellets (P1 and P2) and the soluble fraction (S) were analyzed and quantified by Western blot. The percentage of insoluble U2AF is indicated. (E) Effect of U2AF2 RS domain truncations on its subcellular localization. Wild-type HEK293 cells or clones with the indicated truncations were co-labeled with anti-U2AF2 antibodies, and anti-SF3B1 antibodies, anti-SC-35 antibodies or a polyT probe. Colocalization with U2AF2 was inferred from correlation coefficients of fluorescence intensities along a line crossing spots for each of these markers as previously described [41]. Quantification for 40 cells in each condition is presented on boxplots. (F) Effect of the partial and complete removal of the RS domain of U2AF2 on cell proliferation. The doubling time of control and mutated subclones with RS deletions was analyzed using MTT (3-[4,5-dimethylthiazol-2-yl]-2,5 diphenyl tetrazolium bromide) assay. Six mutants with the small deletion and two mutants with the large deletion were grouped together to attain statistical significance. Statistics: mean ± SD, *P <0.01, two-sample t-test.
Biochemical analyses have revealed the importance of this RS domain for splicing in vitro. U2AF2 depletion from HeLa cell nuclear extracts abolishes splicing of a model pre-mRNA [37]. Adding back purified HeLa U2AF2 or in vitro translated human U2AF2 restores splicing of this model pre-mRNA [32, 37]. Deletion of the RS domain (aa 25–63) from in vitro translated U2AF2 severely compromises the ability of U2AF2 to support pre-mRNA splicing [37]. U2AF2 produced in bacteria is also able to restore splicing activity of a U2AF-depleted HeLa nuclear extract and deletion of the N-terminal region (aa 1–94) abolishes this activity [20]. Then, replacement of the RS domain by synthetic RS domains consisting of seven repeats of dipeptide (RS)7, or (RA)7 or (KS)7 is sufficient to keep splicing activity while replacement with (RD)7 is not, suggesting that a net positive charge is required. It was therefore proposed that the essential role of the RS domain is to shield the negative charges of the RNA backbone to help annealing of U2 snRNA to the branch point sequence.
The small subunit of U2AF, U2AF1, binds the AG dinucleotide just preceding the 3′ splice site during spliceosome assembly. It has also an RS domain and interacts with SR proteins that themselves interact with the U1 snRNP U1-70K protein, providing a potential bridge between 5′ and 3′ splice sites. The RS domain of U2AF1 has been implicated in binding to these SR proteins [38, 39].
Interestingly, it was later shown that, in drosophila, these RS domains of U2AF have an essential and redundant function as the presence of one but not of both is required for viability [40]. Therefore, it was suggested that the RS domain of U2AF2 could probably ensure essential interactions of U2AF1 with SR proteins.
We previously showed that the N-terminal RS domain of U2AF2 (aa 27–62) can drive LLPS in vitro indicating that it can also mediate multivalent low affinity homotypic self-interactions [41]. We have proposed that these interactions are reinforcing the binding of U2AF2 on SF3B1 that has the unique particularity among ULM-containing proteins to present a series of five ULM motifs in its N-terminal region [17, 18]. The U2AF2-similar and cancer-related splicing factor CAPERalpha (also named RBM39) also harbors an N-terminal RS domain and a C-terminal UHM domain that binds SF3B1 ULMs [42]. We showed that U2AF2, but not a form truncated of its N-terminal RS domain, can enhance the recruitment of RBM39 on the multi-ULM domain of SF3B1 [41]. Therefore, we proposed a model where homotypic and heterotypic contacts between RS domains of U2AF2 and RBM39 favor or stabilize assemblies of these factors at the surface of the N-terminal multi-ULM region of SF3B1, thereby facilitating U2snRNP recruitment to the branch point site.
In addition to SF3B1, a dependence on the RS domain of U2AF2 for interactions with the C-terminal domain (CTD) of the RPB1 subunit of RNA polymerase II (Pol II), with the polyA polymerase and with the cleavage factor CFIm59 (CPSF7) has also been reported [43–45]. Altogether, besides its proposed role in contacting RNA at the branch site, the RS domain of U2AF2 might have a general function in mediating or stabilizing interactions with its protein partners.
Here, we address this function of the RS domain of U2AF2 using global proteomic and transcriptomic approaches. Proximity labeling and biochemical analyses reveal that deletion of its RS domain globally impacts known and novel putative interactions of U2AF2 with protein partners. We explore whether the binding properties of the RS domain of U2AF2 are linked to its mixed charge nature. We use unphosphorylatable and phosphomimetic mutants of the RS domain of U2AF2 to infer the regulatory actions of its phosphorylation on U2AF2 interactions and function in splicing. RNA-seq analyses then reveal that the removal of its RS domain impacts alternative splicing genome-wide with a stronger impact on alternative splicing that occurs in the vicinity of speckles.
Materials and methods
Antibodies
The following antibodies were used:
Bethyl: anti-SF3B155 rabbit polyclonal Ab, A300-997A; anti-U2AF35 rabbit polyclonal Ab, A302-079A; anti-RNA Polymerase II rabbit polyclonal Ab, A300-653A; anti-Phospho RNA Polymerase II (S2) rabbit polyclonal Ab, A300-654A.
Sigma: anti-U2AF2 mouse monoclonal Ab, clone MC3; anti-myc mouse monoclonal Ab, clone 9E10, anti-SC-35 mouse monoclonal Ab (S4045) for Fig. 1.
Santa Cruz: anti-GFP rabbit polyclonal Ab, sc-8334.
Roche: anti-GFP mouse monoclonal Ab, clones 7.1 and 13.1.
Cusabio: anti-RBM10 rabbit polyclonal Ab, CSB-PA019400LA01HU.
GeneScript: anti-FLAG mouse monoclonal Ab, A00187.
Bio-Techne: anti-SRRM2 rabbit polyclonal Ab, NBP2-55697PEP.
Invitrogen: goat anti-rabbit secondary Ab, Alexa Fluor™ 594, A-11012.
In-house rabbit polyclonal antibody against an N-terminal peptide of SF1 has been described previously [46].
Plasmid constructs
Plasmid for expression of GST-fused human SF1 (residues 1–255) (SF1f) was based on the pGEX6P-1 vector [47].
Plasmids for expression of GST-SF3B1 residues 1–493 or 190–344 and GST-RPB1-CTD (residues 1586–1970 of human RPB1) were described previously [48].
Plasmids for expression of GST fusions of human U2AF2 mutants (RS-domain deletion and serine mutations S-to-A, or S-to-D) were obtained by restriction free cloning using GST-U2AF2-WT as a template and were based on the pGEX6P-1 vector.
For the BioID2-fused recombinant constructs, complementary DNA (cDNA) encoding full-length U2AF2 and U2AF2 lacking RS domain (amino acids 25–63) were cloned into mycBioID2-pBABE-puro backbone [49] (Addgene plasmid #80900).
cDNA encoding SUGP1, CHERP, SRSF10, CCAR1, U2SURP, BCLAF1, THRAP3, and DDX42 in donor vectors pDONR223 (for SUGP1) and pDONR221 (for other proteins) were obtained from the plasmid repository DNASU [50] (clones HsCD00081287, HsCD00878865, HsCD00718989, HsCD00832975, HsCD00829034, HsCD00829532, HsCD00079943, and HsCD00877133, respectively). GFP/3 × Flag-fused expression constructs for SUGP1, CHERP, SRSF10, and CCAR1 were generated through LR recombination (Invitrogen™) using the corresponding donor vectors and destination vector pDEST-3xFlag-GFP [51] Addgene plasmid #122845). 2 × Flag-fused expression constructs for U2SURP, BCLAF1, THRAP3, and DDX42 were prepared by LR recombination between donor plasmids and destination vector pDEST-2xFlag [52] (Addgene plasmid #118372).
pCDNA3-based expression plasmids for myc-tagged U2AF2 with RS-domain deletions and R-to-A, R-to-K, S-to-A, or S-to-D mutations were prepared using restriction-free cloning [53] and a template construct mentioned before [41].
RG6 splicing reporter construct was a gift from Thomas Cooper [54] (Addgene plasmid #80167).
pEGFPC1-based expression plasmids for GFP-fused U2AF2 mutants (RS-domain deletions and R-to-A, R-to-K, S-to-A, or S-to-D mutations) were prepared using restriction-free cloning and a template construct GFP-U2AF2-WT that was obtained by cloning the corresponding human cDNA sequence into the pEGFPC1 vector.
All the plasmids and mutations were validated by DNA sequencing (Eurofins Genomics).
Cell culture, transfection, and lentiviral transduction
HEK293 cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM, Life Technologies) supplemented with 10% fetal bovine serum at 37°C and 5% CO_2_.
Transient transfections were performed using the corresponding plasmid DNA constructs and Lipofectamine 2000™ transfection reagent (Invitrogen) according to the manufacturer’s protocol. Transfected cells were processed 24 h following transfection.
For U2AF2 knockdown, HEK293 cells were transduced with lentiviral particles for expression of shRNAs as previously described [41]. The day following infection, 1 µg/ml of puromycin was added to the cell medium; selection was maintained for 48 h and cells were harvested for RNA purification. At that time point, all cells in the control nontransduced well had died. shRNAs from the TRC library used to knockdown U2AF2 were TRCN0000001162 (CGACGAGGAGTATGAGGAGAT) and TRCN0000001164 (CGCCTTCTGTGAGTACGTGGA)
Stable expression of BioID2-fused recombinant proteins in HEK293 cells was obtained using lentiviral transduction. Briefly, 300 000 cells were plated in 3.5 cm-diameter wells of a six-well plate. The next day, the cells were infected with 2 ml of the corresponding viral supernatant (pBABE-BioID2, pBABE-BioID2-U2AF2-WT, or pBABE-BioID2-U2AF2-dRS) with 8 µg/ml of polybrene. The day following the infection, the cell medium was changed with DMEM containing 1 µg/ml puromycin. After four days of selection, cells were split twice into 10 cm dishes and maintained to reach confluence before lysis. Overall, two 10 cm dishes were used per replicate, whereas each BioID2 sample was performed in triplicate.
Establishment of U2AF2-dRS cell lines
To delete regions of the RS domain of U2AF2 in HEK293 cells, guide RNA sequences were designed and cloned into the PX330 vector [55]. For the short dRS1 deletion, the guideRNA sequences used were Guide #18 GGGACAAGGAGAACCGGCATCGG and Guide #3 TCTTTAGCGCCTCTGGTCAAAGG. For the long dRS2 deletion, the guideRNA sequences used were Guide #14 TGTGGCTGCGCTTCCGATGCCGG and Guide #19 TGTCCCGGGAGGCGCTCCGCTGG.
HEK293 cells were transfected with the combinations of guideRNA expressing PX330 constructs. Individual clones were obtained by limiting dilution in 96-well plates. Clones were tested for the deletions by polymerase chain reaction (PCR) on genomic DNA and then for expression of the truncated proteins by Western blot.
RNA extraction, library preparation, and sequencing
For RNA-seq analyses, RNA was purified from HEK293 cells using NucleoSpin RNA kit (Macherey-Nagel). Concentration and integrity of RNA were analyzed using a nanodrop spectrophotometer (Thermo Scientific). PolyA selection and library construction, followed by Illumina NovaSeq 2 × 150 bp sequencing was performed by Genewiz. About 50M paired-end reads per sample were obtained and delivered as raw data in FASTQ format.
Differential alternative splicing analysis
The obtained paired-end reads (fastq files) were aligned to the human reference genome GRCh38.p13 (GENCODE release 38) using STAR aligner v2.5 using a 2 Pass option. Parameters were set to -p to specify paired-end reads and meta-feature level (gene). Output BAM files were indexed using Samtools v1.14. Global analysis of alternative splicing events was performed using rMATS v4.1.1 (replicate multivariate analysis of transcript splicing) [56] which classifies AS events into skipped exon (SE), alternative 3′ splice sites (A3SS), alternative 5′ splice sites (A5SS), mutually exclusive exon (MXE), and intron retention (IR). Alternative splicing events with FDR < 0.05 were considered significant.
To identify cassette exons and quantify reads covering exon junctions, BAM files generated by STAR were used as input for running the FeatureCounts software from the Subread package v2.0.3.
The junction files output of FeatureCounts was then filtered to select junctions with more than a mean of four reads for each of the five control samples. Exons coordinates were derived from the gencode.v38.primary_assembly.annotation.gtf file. Internal exons were selected by joining the exon file with the filtered junctions file. Exons presenting alternative 5′ or 3′ splice sites were then removed. Exon expression was calculated based on the number of reads covering the junctions. Then, from the resulting internal exons file, cassette exons were selected when representing between 1% and 99% of the expressed isoforms. Exons included in > 99.5% of all isoforms were considered as constitutive exons.
Sequences at 5′ss (three bases before and six bases after the 5′SS) and sequences at the 3′ss (100 bases before the 3′ss, or the full intron when its size was under 100 bases, and three bases after the 3′ss were recovered using getfasta from bedtools.
Using these sequence files, further analyses were performed using R language:
To score 3′ss, a matrix of base frequencies was calculated using the set of constitutive exons, considering positions −4 to + 3 relative to the 3′ss. Using this matrix, the 3′ss score for each intron was calculated by summing, for each base of that intron between positions −4 to + 3, the frequency of the corresponding base in constitutive exons. This scoring method evaluates the strength of splice site as the information content of each sequence [57, 58].
To score 5′ss sequences, a similar frequency matrix method was used for positions −3 to + 6 relative to the 5′ss.
The polypyrimidine tract score (ppt score) was calculated similarly for a 10-bases window between positions −12 to −3 of the intron relative to the 3′ss.
Branch point sequences were searched for in a window between position −(Agez + 29) and −18 relative to the 3′ss, using the branchpointer package in R that also determined the strength of the identified branch point sequence as a bps score.
Splicing changes calculations were based on the differences of splicing index between the treated and control samples.
Hierarchical clustering of the different samples was achieved using hclust function with the Ward’s minimum variance method.
Pearson correlations of the different features scores and speckle proximity scores with splicing changes were calculated using cor.test function in R and GraphPad Prism software v.6.01.
Speckle proximity scores for genes and transcripts were obtained from five independent studies: Bhat et al. (2024), Barutcu et al. (2022), Zhang et al. (2021), Wu et al. (2024), and Khyzha et al. (2025). The overlap between these data and exons detected in RNA sequencing was obtained by merging the datasets based on their gene ID using custom Python scripts.
Recombinant protein purification
Variants of the human SF3B155 domain (residues 190–344), a fragment of human SF1 containing U2AF2 interacting and the BPS interacting domains (residues 1–255), human U2AF2, the RS domain, and a shortened variant lacking the RS domain were expressed as glutathione-S-transferase (GST) fusion proteins from pGEX-6p vectors in Escherichia coli BL21 using standard procedures as described previously [41, 48].
Recombinant protein and cell extracts sedimentation assays
Protein sedimentation assays using recombinant purified proteins or total cell extracts from HEK293 cells were performed as previously described [41].
Detection of phosphorylated proteins
To detect phosphorylated forms of U2AF2, proteins were resolved on sodium dodecyl sulphate (SDS)-gels copolymerized with Phos-tag^™^ acrylamide (FujiFilm) following the manufacturer’s instructions.
Detection of the protein phosphorylation in gels was performed using Pro-Q^®^ Diamond Phosphoprotein gel stain (Invitrogen™) according to the manufacturer’s instructions. Briefly, Polyacrylamide Gel Electrophoresis (PAGE) gel was fixed in 50% methanol and 10% acetic acid, washed with water, and incubated with Pro-Q^®^ Diamond stain. Following destaining using destain solution (20% acetonitrile, 50 mM sodium acetate, pH 4) and water, phosphoproteins were visualized using Amersham™ Typhoon™ Biomolecular Imager (GE Healthcare).
GST-pull-down and immunoprecipitation
For the preparation of cell extracts, HEK293 cells were washed twice with phosphate-buffered saline (PBS) and lysed using lysis buffer containing 50 mM Tris, pH 7.4, 150 mM NaCl, 1% NP-40, 1 × cOmplete™ Protease Inhibitor Cocktail (Roche), 0.1 μM phenylmethylsulfonyl fluoride (PMSF) and 10 μg/ml RNAse A (Thermo Scientific) for 20 min on ice. Cell extracts were clarified by centrifugation at 12 000 × g for 10 min. For pull-down experiments, 40 pmol of GST or GST-fused protein were immobilized on glutathione beads (20 μl; GE Healthcare) in binding buffer (50 mM Tris, pH 7.4, 150 mM NaCl, 1% NP-40) for 1 h at 4°C. The beads were washed twice with washing buffer (50 mM Tris, pH 7.4, 150 mM NaCl, 0.1% NP-40) and incubated with HEK293 cell extracts for 1 h at 4°C. Following three times of washes using washing buffer, retained proteins were separated in sodium dodecyl sulphate–polyacrylamide gel electrophoresis (SDS–PAGE) and detected by immunoblot with 700 or 800 nm IRDye-conjugated antibodies (LI-COR Biotech) using Amersham™ Typhoon™ Biomolecular Imager (GE Healthcare).
For immunoprecipitation, HEK293 cell extracts were incubated with the corresponding antibodies for 1 h at 4°C and then with Protein G Sepharose™ 4 Fast Flow (30 μl; Cytiva) for 1 h at 4°C. Following four times washes using washing buffer, retained proteins were separated in SDS–PAGE and detected by immunoblot as described above.
BioID2 proximity labeling and pull-down
HEK293 cells were transduced with lentiviral particles containing pBABE-BioID2 constructs and selected with 1 μg/ml puromycin as described above. Following the treatment with 100 μM biotin (Sigma–Aldrich) for 20 h, cells were washed with PBS and lysed in lysis buffer containing 50 mM Tris, pH 7.4, 150 mM NaCl, 0.1% SDS, 0.5% sodium deoxycholate, 1% NP-40, 1 mM ethylenediaminetetraacetic acid (EDTA), 1 mM dithiothreitol (DTT), 1 × cOmplete™ Protease Inhibitor Cocktail (Roche), 0.1 μM PMSF, and 20 μg/ml RNAse for 20 min on ice. Following three times of vortexing each for 30 s, cell lysates were spun using centrifugation at 16 000 × g for 20 min and the cleared supernatants were transferred to new tubes. After measuring and adjusting protein concentration in all samples (2.6 mg/ml), the cell extracts were incubated with 100 μl streptavidin agarose beads rotating overnight at 4°C. Samples were washed with lysis buffer, twice with wash buffer 1 (50 mM Tris, pH 7.4, 150 mM NaCl, 1% SDS, 0.5% sodium deoxycholate, 1% NP-40, 1 mM EDTA, 1 mM DTT), twice with wash buffer 2 (50 mM Tris, pH 7.4, 500 mM NaCl, 0.5% SDS, 0.5% sodium deoxycholate, 1% NP-40, 1 mM EDTA, 1 mM DTT), and twice with wash buffer 3 (50 mM Tris, pH 7.4, 150 mM NaCl, 0.5% SDS, 0.5% sodium deoxycholate, 1% NP-40, 1 mM EDTA, 1 mM DTT). Finally, beads were heated with 30 μl of 1 × SDS–PAGE sample buffer supplied with 50 mM biotin at 98°C for 10 min. Following short migration in 10% polyacrylamide gel and Coomassie staining, gel slices with total protein bands were cut from the gel and kept at −80°C before mass spectrometry analysis. In addition, 10% of the samples were analyzed in Western blot using IRDye^®^ 800CW Streptavidin (LI-COR Biotech).
Mass spectrometry analysis
Sample preparation
Short migration gel bands containing all the proteins corresponding to the different samples, each in triplicate, were excised and subjected to in-gel enzymatic digestion. Briefly, the bands were washed with acetonitrile and 100 mM Amonium bicarbonate. Following treatment with 10 mM DTT for 30 min at 56°C, cysteine carbamidomethylation was carried out for 30 min by adding 55 mM iodoacetamide. After removing the supernatant, the washing procedure was repeated and the gel bands were dried. Tryptic peptides were generated by recovering the gels with 5 ng/µl of sequencing-grade modified trypsin (Promega) and incubating overnight at 37°C. Proteolytic peptides were first extracted by addition of one volume of 50% acetonitrile and 0.1% formic acid, followed by one volume of 100% acetonitrile. The extracted tryptic peptides were vacuum dried and resuspended in 2% acetonitrile and 0.05% trifluoroacetic acid prior to nanoLC-MS/MS mass spectrometry analysis.
Mass spectrometry analysis
NanoLC-MS/MS analyses were performed using a nanoElute liquid chromatography system coupled to a timsTOF pro mass spectrometer (Bruker, Billerica, MA, USA). Briefly, peptides were desalted online using a trap column (Waters, NanoE MZ Sym 18, 180 µm × 20 µm Trap column) before being loaded onto an Aurora analytical column (ION OPTIK, 25 cm × 75 µm, C18, 1.6 µm), and eluted with a gradient of 0%–35% solvent B over 100 min as following: 0%–15% B in 60 min, 15%–23% B in 30 min, 23%–35% B in 10 min. Solvent A was 2% acetonitrile and 0.1% formic acid in water, and solvent B was 99.9% acetonitrile with 0.1% formic acid. MS and MS/MS spectra were recorded from m/z 100 to 1700 with a mobility scan range from 0.65 to 1.45 V_s/cm². MS/MS spectra were acquired using the PASEF (parallel accumulation—serial fragmentation) ion mobility-based acquisition mode, with the number of PASEF MS/MS scans set to 10.
Database searching
Tandem mass spectra were extracted using DataAnalysis software (Bruker, Billerica, MA, USA). All MS/MS samples were analyzed with Mascot (Matrix Science, London, UK; version 2.6.2). Mascot was set up to search the Uniprot human database (010 322, 20 380 entries) including U2AF and trypsin protein sequences, assuming strict trypsin digestion. Mascot was configured with a fragment ion mass tolerance of 0.050 Da and a parent ion tolerance of 15 parts per million (PPM). Carbamidomethylation of cysteine was specified as a fixed modification. Biotinylation of lysine and the N-terminus were specified as variable modifications.
Criteria for protein identification
Scaffold (version Scaffold_4.10.0, Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they could be established at >71.0% probability, achieving a false discovery rate (FDR) of <1.0% according to the Scaffold Local FDR algorithm. Protein identifications were accepted if they could be established at >95.0% probability and contained at least two identified peptides. Protein probabilities were assigned using the Protein Prophet algorithm [Nesvizhskii, Al et al., Anal. Chem. 2003; 75(17):4646–58]. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Proteins sharing significant peptide evidence were grouped into clusters.
Protein proximity data analysis
Only peptides with FDR < 1% and proteins with a minimum 95% probability and at least two unique peptides were used for further analysis. Spectral counts were exported from Scaffold v 5.1.2 and were filtered by removing known mass-spec contaminants in BioID assays. Only proteins with a spectral count of ≥ 3 in U2AF2-WT samples were kept in the list. For each protein, an interaction score was calculated as the spectral count difference between U2AF2-WT and control (BioID2 only) normalized to the molecular weight of the corresponding protein:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \textit{Protein}\ \textit{interaction}\ \textit{score} = \frac{{\left[ {SC\left( {U2AF65wt} \right)\ - SC\ \left( {\textit{control}} \right)} \right]}}{{\textit{Protein}\ MW}} \end{eqnarray*}\end{document}The final list of the putative interactors containing 286 proteins was ranked by the calculated interaction score and included the candidates with statistically significant differences (FDR ≤ 0.05) between U2AF2-WT and control triplicates.
To estimate if the interaction between U2AF2 and an identified candidate depends on its RS domain, for each protein RS dependence score was calculated as the spectral count difference between U2AF2-WT and U2AF2-dRS divided by the the spectral count of U2AF2-WT:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} RS\ \textit{dependence}\ \textit{score} = \frac{{\left[ {SC\left( {U2AF65wt} \right)\ - SC\ \left( {U2AF65dRS} \right)} \right]}}{{SC\left( {U2AF65wt} \right)}} \end{eqnarray*}\end{document}In addition, the significance of the difference between U2AF2-WT and U2AF2-dRS triplicates was assessed with FDR (Benjamini–Hochberg correction).
Significance Analysis of INTeractome (SAINT) [59] was performed using SAINTexpress v3.6.3 with default parameters. Proteins with FDR ≤ 0.05 (total 147 proteins) were used to check the overlap with the list of 286 putative interactors.
The list of known U2AF2 partners obtained in low- and high-throughput studies was acquired from available protein interaction databases BioGRID v4.4 (https://thebiogrid.org/; accessed 5 September 2023), IntAct (https://www.ebi.ac.uk/intact/home; accessed 18 September 2023), IMEx (https://www.imexconsortium.org/; accessed 6 October 2023) and APID (http://apid.dep.usal.es; accessed 18 September 2023).
Gene Ontology (GO) enrichment analysis was performed using PANTHER v18.0 (released 17 October 2023) [60]. A selection of nonredundant and significantly enriched GO terms (FDR < 10^−5^) is depicted in Fig. 6C using a custom Python script.
The classification of splicing proteins was obtained from the Spliceosome Database [61], the list of speckle proteins acquired from [62] and [63], and the list of RS proteins obtained from [28]. The search for the known and putative ULM motifs was performed using a custom Python script. In brief, a 10 amino acid ULM-consensus motif [R/K]_7_W[D/N][E/Q] was searched in the sequences of 286 putative interactors. Motifs with the targeted amino acids in at least six positions were considered as putative ULMs.
Gene set enrichment analysis (GSEA) was performed with GSEA v4.3.2 [64] using pre-ranked lists and gene sets as inputs. Default parameters with setting “enrichment statistics” as “classic” were used to estimate enrichment significance.
For the protein interaction network, physical protein interactions (confidence score > 0.4) were obtained from STRING v12.0 [65] and visualized using Cytoscape (v3.10.1) [66].
Filtering and analysis of the list of proteins identified in the proximity labeling mass-spectrometry were performed using custom Python scripts.
Splicing reporter assay
HEK293 cells were plated in a 96-well plate and transfected with RG6 splicing reporter construct in combination with plasmids expressing myc-tagged U2AF2 mutants (total 0.1 µg of DNA per well). After 24 h of growth, cells were washed twice with PBS and fixed with 4% paraformaldehyde (PFA) for 20 min at 37°C. Finally, cell nuclei were stained using 4′,6-diamidino-2-phenylindole (DAPI) and samples were kept in PBS. The cell images were obtained using the Opera Phenix^®^ Plus HCS System (Revvity) at 10× magnification in the confocal mode. Three channels were used for image acquisition: (i) DAPI (excitation 405 nm; emission 435–480 nm), (ii) EGFP (excitation 488 nm; emission 500–550 nm), and (iii) Red Fluorescent Protein (RFP) (excitation 561 nm; emission 570–630 nm). Images were analyzed using the Harmony v5.1 software (Revvity). Briefly, following the detection of the nuclei using the DAPI signal, the fluorescence intensities of EGFP and RFP of nuclear regions were measured. The relative efficiency of RG6 exon inclusion was calculated as the GFP/RFP ratio for each detected nucleus and was averaged for a single well in a 96-well plate. At least three wells in a plate were analyzed for each condition. The corresponding scatterplots were created using MATLAB R2023a software.
Immunofluorescence staining
Immunofluorescence staining of HEK293 cells and quantification of signal overlap in Fig. 1 were performed as previously described [41].
For Supplementary Fig. 4, HeLa cells were plated in a 96-well plate and transfected with the indicated plasmids encoding GFP-fused proteins as described above. Following 48 h of transfection cells were washed twice with PBS and fixed with 2% PFA for 20 min at room temperature. The cells were washed with PBS and incubated with the blocking solution (PBS, 2% bovine serum albumin (BSA), 0.2% Triton X-100) for 1h at room temperature. Next, cells were incubated with anti-SRRM2 antibodies (1:300, Bio-Techne) overnight at 4°C followed by incubation with Alexa Fluor 594-conjugated secondary antirabbit antibodies (1:1000, Invitrogen). Finally, cell nuclei were stained using DAPI and samples were kept in PBS. The cell images were obtained using the Opera Phenix^®^ Plus HCS System (Revvity) at 40× magnification in the confocal mode. Three channels were used for image acquisition: (i) DAPI (excitation 405 nm; emission 435–480 nm), (ii) EGFP (excitation 488 nm; emission 500–550 nm), and (iii) RFP (excitation 561 nm; emission 570–630 nm). Images were analyzed using the Harmony v5.1 software (Revvity). Briefly, following the detection of the nuclei using the DAPI signal, speckles were detected in the nuclear region based on the RFP signal (SRRM2). The fluorescence intensities of EGFP (U2AF2) in nuclear speckles and nuclear regions were measured. Enrichment of U2AF2 mutants in SRRM2-labeled speckles was calculated as the ratio between mean GFP signal in speckles region divided by the total GFP signal in the nucleus. The corresponding violinplots were created using MATLAB R2023a software.
Reverse Transcription-quantitative Polymerase Chain Reaction (RT-qPCR) splicing analyses
Total RNA was prepared using Nucleospin RNA preparation kit (Macherey-Nagel) from HEK293 cells, CRISPR-deleted mutant clones, and mutants transfected for 24 h to express either wild-type or mutants U2AF2. Reverse transcription using GoScript reverse transcriptase (Promega) and random six nucleotides primers was followed by qPCR with Gotaq mix (Promega) on C1000 touch-CFX384 real time PCR system (Bio-Rad) and the following primers:
SLIT2_in_f: GCA GTG ATG AGG AAG AAG GTC
SLIT2_out_f: ATT TGT CTG CAG TGG TCA CC
SLIT2_r: GAC CTT TCC CAC GAC AGT CT
ARFIP2_in_f: GGA GGC AGC CTA AGG GAG
ARFIP2_out_f: GTC CCC AGA GCT TCA GGA G
ARFIP2_r: CTA GCA GCG TTT CCC CAT TC
Statistics
For proximity data analysis, the significance of the difference between the replicates of the BioID2 samples (control versus U2AF2-WT and U2AF2-WT versus U2AF2-dRS) was analyzed using the two-sample t-test and Benjamini–Hochberg correction with SciPy 1.0 in Python script. Correlation analysis was performed using GraphPad Prism v.6.01 and GSEA was performed using GSEA v4.3.2.
For splicing reporter assay, the two-sample t-test and one-way analysis of variance (ANOVA) with multiple comparison of means were performed using MATLAB functions ttest2, anova, and multcompare. Significance levels were indicated as ∗P <0.05, ∗∗P <0.01, ∗∗∗P <0.001, ∗∗∗∗P <0.0001, and not significant (ns).
Results
U2AF2 RS domain removal from human cells
We first revisited the analysis of the U2AF2 N-terminal sequence [32]. We observed, using the MetaDisorder software [67], that a long disordered N-terminal sequence between aa 1 and 91 just precedes the essential tryptophan 92 of the ULM domain that binds U2AF1 (Fig. 1A). Indeed, this N-terminal domain is a LCD that presents a global enrichment of arginine-serine dipeptides, as well as lysine, aspartic, and glutamic acid residues (Supplementary Fig. 1A). Beside the nine RS or SR dipeptides present between aa 25 and 63 that define the RS domain of U2AF2, five additional basic-acidic dipeptides are found in the larger LCD (aa 1–91).
As a result, this N-terminal domain is similar to other low complexity mixed-charge domains which are present in splicing-associated nuclear speckle proteins. Such domains often mediate in vitro the formation of condensates through homotypic interactions [68].
Although this RS domain has been shown to be important for splicing in vitro [20, 37], it was shown to be only required for viability of drosophila when the U2AF1 RS domain was deleted, indicating an essential redundant function of the RS domains of U2AF1 and U2AF2 [40]. Therefore, to further analyze the interaction properties and function of the RS domain of U2AF2, we tried to delete it from endogenous human U2AF2 using CRISPR-Cas9 in HEK293 cells. Indeed, we could obtain viable homozygous cells with deletion of part of the RS domain (mutants dRS1: deletion 24–52), or of the complete RS domain (mutant dRS2: deletion 24–65) (Fig. 1B). Levels of the truncated forms of U2AF2 in several clones that were analyzed by Western blot, were similar to that of the wild-type, suggesting no significant effect of the RS domain deletion on U2AF2 expression and stability (Fig. 1C and Supplementary Fig. 1). The solubility of U2AF2 increased as the extent of RS-domain deletions increased (Fig. 1D). In agreement with previous reports by Carmo-Fonseca and collaborators, we observed the localization of U2AF2 in the nucleus with enrichment in nuclear speckles labeled by the anti-SC-35 monoclonal antibody, as well as colocalization with SF3B1 and polyadenylated RNA (Fig. 1E) [69]. U2AF2 colocalization with SF3B1 and polyadenylated RNA was higher than with the SC-35 antigen which is known to be restricted to inner regions of speckles [70]. The largest deletion of the RS domain reduced the localization of U2AF2 in speckles, similarly to what was observed for overexpressed U2AF2 in HeLa cells [41]. Growth analysis of multiple independent clones for the two deletions revealed a moderate growth reduction compared to control clones (Fig. 1F). This characterization of U2AF2-RS domain deleted cells validated these clones as interesting tools for further biochemical and transcriptomic analyses.
U2AF2 RS domain involvement in protein interactions in vitro
Previously, in GST pull-down experiments, it was revealed that: (i) the lack of the RS domain of human U2AF2 (deletion of aa 1–92) significantly affects its binding to the CTD of the RPB1 subunit of Pol II [43]. (ii) Deletion of part of the RS domain (aa 30–47) reduces the interaction of U2AF2 with polyA polymerase [44]. (iii) The N-terminal domain of U2AF2 (aa 1–85) interacts with the RS domain of CFIm59 (CPSF7) but not that of CFIm68 (CPSF6) [45]. In addition, deletion of its RS domain (deletion 25–63) compromises the interaction of overexpressed U2AF2 with immobilized GST-SF3B1 [41].
To get insight into the requirement of the RS domain of U2AF2 in a more physiological context, we performed immunoprecipitations of SF3B1, the Pol II subunit RPB1, and SF1 from wild-type or mutated HEK293 cells (dRS2: deletion 24–65), and tested for the coimmunoprecipitation of U2AF2 (Fig. 2A). All the immunoprecipitations were performed in the presence of RNase to avoid any RNA-mediated interactions. We confirmed the interactions of SF3B1, RPB1, and SF1 with wild-type U2AF2. The deletion of the RS domain compromised these three interactions.
Role of the RS domain of U2AF2 in interactions with key partners. (A) The lack of RS domain of U2AF2 affects its binding to partners in immunoprecipitation assay. The immunoprecipitation of endogenous SF3B1, RNA Pol II largest subunit, and SF1 from extracts of normal (WT) or CRISPR-modified U2AF2 dRS HEK293 cells (dRS2) (cell clones G2 and 33; Fig. 1C) was performed using indicated antibodies. The detection of immunoprecipitated proteins (SF3B1, Pol II largest subunit, RBP1 and SF1) confirms the efficient immunoprecipitation and even loading of precipitated material. Coimmunoprecipated U2AF2 was detected by immunoblotting as well. (B) Pull-down assays using recombinant GST, GST-fused U2AF2 or GST-fused U2AF2-dRS and two concentrations of HEK293 cell extracts. Co-precipitated SF3B1 and SF1 were detected by immunoblotting using protein-specific antibodies. The bait proteins were detected by Ponceau staining. A similar experiment is presented in Supplementary Fig. 2. (C) Coimmunoprecipitation of SF1 and SF3B1 with U2AF2 or U2AF2-dRS. Immunoprecipitation of U2AF2 and U2AF2-dRS was performed from extracts of normal or CRISPR-modified HEK293 cells, respectively (cell clones G2 and 33) using anti-U2AF2 antibodies. The coimmunoprecipitated SF1 and SF3B1, as well as U2AF2, were detected by Western blot (representative results of two experiments each with duplicates). (D) Schematic representation of Myc-tagged U2AF2 RS domain deletions mutants for expression in HEK293 cells. (E) Pull-down assays using recombinant GST-fused SF1 (human SF1, aa 1–255) and extracts of cells transfected with indicated Myc-tagged U2AF2 RS deletion mutants. U2AF2 was detected via Western blot using anti-Myc antibodies (representative result of three experiments). (F) Same as panel (E) for GST-SF3B1 (human SF3B1 aa 190–344).
The other way around, we tested the coprecipitation of SF3B1 and SF1 from HEK293 cell extracts with GST-fused full-length U2AF2 or U2AF2-dRS (deletion 25–63). In agreement with our coimmunoprecipitation experiments, the removal of the RS domain reduced the recovery of SF3B1 and SF1 on immobilized recombinant GST-U2AF2 (Fig. 2B and Supplementary Fig. 2). The impact was strong for SF3B1 but limited for SF1 as previously observed [47]. Coimmunoprecipitation experiments using endogenous wild-type or mutant U2AF2 lacking residues 24–65 (U2AF2-dRS2, referred to as U2AF2-dRS hereafter) in HEK293 cells revealed a significant decrease in SF3B1 binding following RS domain deletion, whereas SF1 binding remained unaffected (Fig. 2C). Several isoforms of SF1 are expressed by alternative splicing [71, 72]. We noted that some isoforms of SF1 were much more efficiently coimmunoprecipitated with U2AF2 and their coimmunoprecipitation with U2AF2 was not affected by the RS domain deletion.
Altogether, these results indicated that the interaction of U2AF2 with SF3B1 in vitro depends strongly on its RS domain, while the dependency on its RS domain for interaction with SF1 was only observed in specific paradigms.
Interactions of SF1 and SF3B1 with U2AF2 are thought to be mediated essentially by recognition of ULM motifs of SF1 and SF3B1 by the C-terminal UHM domain of U2AF2 [42]. Among other evidences, structures of these interactions have been revealed by X-ray crystallography for SF3B1 [19] and both crystallography and Nuclear Magnetic Resonance (NMR) for SF1 [34, 73, 74]. In addition, it has been shown that these interactions are abolished by mutations of the essential tryptophan residues of the ULMs [41, 48, 73]. In contrast, two-hybrid experiments could not detect any interaction of the RS domain of U2AF2 with SF3B1 [16] and, in our pulldown experiments, SF3B1 shows almost no interaction with the RS domain of U2AF2 alone (Supplementary Fig. 2). Therefore, as the RS domain of U2AF2 is capable of homotypic interactions in vitro as revealed by LLPS [41], to explain the dependency on the RS domain of U2AF2 for its interaction with SF1 and SF3B1 we hypothesize that RS-mediated homotypic interactions of U2AF2, when concentrated at the surface of beads enhance the detection of UHM-ULM mediated interactions in these assays. The observation that SF1 interaction with U2AF2 was mainly affected by deletion of the RS domain when SF1 was used as the bait (Fig. 2A), may result from a stronger impact of U2AF2 self-interaction when it is in the soluble form and gets concentrated at the surface of beads.
Altogether, these results suggest that the RS domain has a different impact on interactions of U2AF2 with SF1 and SF3B1.
Then, to identify molecular determinants of the RS domain responsible for the reinforcement of U2AF2 interaction with SF3B1 and SF1, we overexpressed mutated forms of U2AF2 with various deletions in the RS domain in HEK293 cells keeping only three or six RS repeats at the N-terminus or at the C-terminus (Fig. 2D) and tested their interaction with GST-SF3B1 and GST-SF1 in GST pulldown assays. We observed that each region of the RS domain that was tested contributed to the strength of U2AF2 binding on SF1 and SF3B1 (Fig. 2E and F). The strength of the binding increased monotonically with the length of the RS domain. This is reminiscent of the length-dependent capability of mixed charge domains to drive LLPS [68], further suggesting that RS domain homotypic multivalent interactions are contributing to the efficiency of U2AF2 immunoprecipitation or pulldown with SF3B1 and SF1. Still, the C-terminal part of the RS domain appeared to have a slightly stronger impact on U2AF2 interactions with SF1 and SF3B1.
The RS domain of U2AF2 is a putative target for phosphorylation-mediated regulation
RS domains of SR proteins undergo dynamic phosphorylation that affects their aggregation properties, interactions and splicing regulation [75, 76]. By analogy with other RS domains, we expected the RS domain of U2AF2 to be also phosphorylated. Indeed, phosphoproteomic data indicate that U2AF2 RS domain is phosphorylated on seven out of the nine serines in RS dipeptides in the RS domain (PhosphositePlus v6.7.4).
We previously showed that removal of the RS domain of recombinant U2AF2 dramatically improved its solubility [41]. Here, to determine the impact of phosphorylation on RS domain interactions, we first analyzed the solubility of U2AF2 that was phosphorylated by coexpression in the bacteria with the kinase SRPK1. In a simple sedimentation assay, we observed that SRPK1 coexpression greatly improved the solubility of U2AF2, suggesting that phosphorylation reduces homotypic interaction of the RS domain and hence the formation of condensates of the purified protein (Fig. 3A). We then tested whether interactions between the unphosphorylated and phosphorylated U2AF2 could be unraveled by analyzing the sedimentation of mixes of the two recombinant proteins. After centrifugation, the pellet and soluble fractions were analyzed on phos-tag gels to separate the unphosphorylated and phosphorylated forms of U2AF2. Phosphorylated U2AF2 did not apparently cosediment with unphospho-U2AF2 showing that self-interaction is compromised by phosphorylation of one partner (Fig. 3B).
Phosphorylation reduces U2AF2 interaction with key partners. (A) Sedimentation assay for purified U2AF2 and phosphorylated U2AF2 (pU2AF2) obtained by co-expression in bacteria with the kinase SRPK1. After centrifugation, pellet (P) and supernatant (S) fractions were analyzed by SDS–PAGE and stained using Coomassie dye. Representative results of three experiments, means (bars) and SD (error bars). (B) Sedimentation assay for mixes of different amounts of unphosphorylated and phosphorylated U2AF2. The pellet (P) and soluble (S) fractions were resolved on Phos-tag gel and visualized using Coomassie staining (representative results of two experiments). (C) Pull-down assays using recombinant GST alone and GST-fused SF1 (1–255), a phosphomimetic mutant of SF1 (pSF1_EE) or SF3B1 (190–344) and purified unphosphorylated and phosphorylated U2AF2. The fraction of precipitated U2AF2 was detected via Western blot using anti-U2AF2 antibodies. Ponceau staining was used to ascertain similar quantities of the baits (representative results of three experiments). (D) Pull-down assays as in panel (C) but using GST‐fused RPB1 CTD, phosphorylated CTD (pCTD) obtained by incubation of the protein with HEK293 cell extract and 2mM ATP as baits. Bait proteins and U2AF2 were visualized using Ponceau staining of the membrane and immunodetection, respectively.
We then tested the effect of U2AF2 phosphorylation on its association with different partners using recombinant proteins. In pulldown assays, unphosphorylated U2AF2 shows a better retention on GST-SF1, GST-SF3B1, or GST-RPB1-CTD immobilized on glutathione beads (Fig. 3C and D), suggesting in these conditions a general effect of phosphorylation on interactions that, given the sedimentation results, might be due to a higher self-interaction of unphosphorylated U2AF2 when recruited and concentrated on the surface of beads.
To evaluate the role of arginines and phosphorylated serines of the nine RS dipeptides in the RS domain, we expressed mutants of these nine arginine and serine residues in HEK293 cells. Arginine residues were mutated to alanine or lysine (U2AF2-AS and U2AF2-KS mutants) and serine residues were substituted with alanine (U2AF2-RA) or aspartic acid (U2AF2-RD) to mimic putative dephosphorylated and phosphorylated states of the RS domain, respectively (Fig. 4A). Both RS domain phosphorylation sites mutants, as well as a mutant lacking RS domain (U2AF2-dRS), showed a reduced phosphorylation state compared to the wild-type protein when expressed in HEK293 cells, suggesting that serine residues in the RS domain of U2AF2 actually undergo phosphorylation in vivo (Fig. 4B). Sedimentation assay using extracts of cells overexpressing myc-U2AF2 mutants confirmed that U2AF2-dRS is more soluble than WT U2AF2 and showed that nonphosphorylatable U2AF2-RA has a tendency to be enriched in the insoluble fraction compared to wild-type or phosphomimetic mutant of U2AF2 (Supplementary Fig. 3A). In these assays, the low differences of solubility observed between the different RS domain mutants could be related to interactions with partners or other posttranslational modifications occurring in cells.
*RS dipeptides mutants affect U2AF2 interactions with key partners. (A) Schematic representation of the Myc-tagged U2AF2 with mutated residues in the RS domain. Nine serine residues in the RS dipeptides of the U2AF2 RS domain were substituted with alanine (RA) or aspartic acid (RD). Nine arginine residues in the RS dipeptides were substituted with alanine (AS) or lysine (KS). (B) Analysis of the phosphorylation state of U2AF2 serine mutants. CRISPR-modified HEK293 cells expressing U2AF2-dRS (cell clone 33, Fig. 1B and C) were transfected with the indicated Myc-tagged U2AF2 mutants. Total extracts of transfected cells were resolved by SDS–PAGE. Protein content was visualized using Coomassie staining, and phosphoproteins were detected using ProQ Diamond phospho staining. Recombinant U2AF2 expression was checked by immunoblot using anti-Myc antibodies. The mean ± SD values of three experiments are presented on the histogram. *P <0.05, unpaired t-test versus U2AF2 WT. (C) Analysis of the phosphorylation state of GST-U2AF2 mutants (dRS and serine mutants RA and RD) obtained by co-expression in bacteria with the kinase SRPK1. Phosphoproteins were detected using ProQ Diamond staining and then the total proteins by Coomassie blue staining. (D) Sedimentation assay (200 mM NaCl condition) for purified GST-fused U2AF2 and its mutants co-expressed in bacteria with the kinase SRPK1. After centrifugation, pellet (P) and supernatant (S) fractions were analyzed by SDS–PAGE and stained using Coomassie dye. Representative results and quantification (mean ± SD) of two experiments are shown. *P <0.05, ns : nonsignificant, unpaired t-test versus U2AF2 WT. (E) Pull-down assays using recombinant GST alone, GST-SF3B1 (190–344) or recombinant SRPK1-phosphorylated GST-U2AF2 (full-length) and extracts of HEK293 cells transfected with indicated U2AF2 mutants. Co-precipitated U2AF2 was detected using immunoblotting (anti-myc antibodies), whereas GST-fused baits were visualized using Ponceau staining. The mean ± SD values (bars) and individual measures (dots) of four experiments are presented on the histogram. P <0.05, ns : nonsignificant, unpaired t-test versus U2AF2 WT. (F) Coimmunoprecipitation of U2AF2 mutants with SF3B1. The immunoprecipitation of endogenous SF3B1 from extracts of cells transfected with indicated Myc-tagged U2AF2 mutants was performed using SF3B1-specific antibodies. Precipitated proteins were detected by Western blot with the corresponding antibodies. (G) Coimmunoprecipitation of U2AF2 mutants as in panel (F) but with RNA polymerase II. Polymerase was precipitated using Pol II-specific antibodies, and coimmunoprecipitation of U2AF2 mutants was checked by immunoblotting with anti-Myc antibodies. (H) The inverse coimmunoprecipitation to panels (F) and (G). Endogenous SF3B1 and RNA polymerase II, as well as SF1, were coimmunoprecipitated with indicated Myc-tagged U2AF2 mutants using anti-U2AF2 antibodies. Immunoprecipitation was performed from extracts of CRISPR-modified U2AF2-dRS HEK293 cells (cell clone 33) transfected with indicated U2AF2 mutants. Proteins of interest were detected using corresponding antibodies (representative results of two experiments). (I) Model explaining the role of the phosphorylation of U2AF2 RS domain in self-interaction and association with SF3B1. Strong self-interactions between unphosphorylated U2AF2 molecules (left part) increase the amount of coprecipitated U2AF2 upon SF3B1 precipitation. In contrast, U2AF2 phosphorylation (right part) weakens self-interactions, reducing the number of U2AF2 molecules coprecipitated with SF3B1. However, in both cases, the total number of coprecipitated SF3B1 molecules does not change upon U2AF2 precipitation (see also Supplementary Fig. 5).
To further explore the phosphorylation of the U2AF2 RS domain and assess its impact on self-interactions, we then purified and analyzed GST-fused U2AF2 mutants lacking the RS domain or with serine mutations to alanine or aspartic acid coexpressed in the bacteria with the kinase SRPK1. All three mutants displayed reduced phosphorylation compared to wild-type U2AF2 (Fig. 4C). Sedimentation assays demonstrated that nonphosphorylatable U2AF2-RA is significantly less soluble than phosphorylated wild-type U2AF2, its phosphomimetic RD mutant, or the dRS deletion mutant (Fig. 4D).
To monitor the impact of mutations on U2AF2 localization, we introduced them in a GFP-U2AF2 expression construct (Supplementary Fig. 4A). The different U2AF2 forms were similarly expressed (Supplementary Fig. 4B). All displayed enrichment in nuclear speckles (Supplementary Fig. 4C). RS deletion reduced speckles localization in a length dependent manner similarly to SF3b1 and SF1 interactions in pulldown assays (Fig. 2E and F). The AS mutant localization to speckles was also reduced. The phosphomimetic RD mutant localization was similar to WT. The nonphosphorylatable mutant RA localization to speckles was reduced showing that reduced solubility is not necessary leading to increased association with speckles possibly because spurious interactions hinder the correct targeting of this isoform. Altogether, while other sequences already drive its localization to speckles, the RS domain, its arginine or similarly basic lysine residues and a proper phosphorylation of the serine residues are needed for the most efficient targeting of U2AF2 in speckles.
In GST pull-down assays with GST-SF3B1 and coimmunoprecipitation with SF3B1, we observed a reduction of binding of U2AF2-AS but not KS mutants (Fig. 4E and F, and Supplementary Fig. 3B and C). These observations suggest that positive charges in RS dipeptides play an important role in the homotypic interactions of the RS domain. Indeed, U2AF2 recovery on GST-U2AF2 beads was strongly reduced by the arginine-to-alanine but not the arginine-to-lysine mutations (Fig. 4E). As expected, the R452D mutation that affects the essential RXF motif of the UHM, reduced the binding to SF3B1. It did not affect the binding to GST-U2AF2, indicating that the pulldown of U2AF2 with itself is not due to a bridging by SF3B1. The nonphosphorylatable U2AF2-RA mutant showed a significantly better binding to GST-SF3B1 and GST-U2AF2 immobilized on glutathione beads compared to wild-type and phosphomimetic U2AF2-RD mutant (Fig. 4E). Overall, analysis of phosphorylated U2AF2 (Fig. 3) together with a limited set of RS-domain mutations, suggests a positive correlation between U2AF2 self-interaction or binding to SF3B1 in GST pull-down assays and U2AF2 sedimentation behavior in sedimentation assays (Fig. 4E and Supplementary Fig. 3D). Consistently, immunoprecipitation assays using antibodies against either SF3B1 or RNA polymerase II showed increased co-precipitation of U2AF2-RA with these partners (Fig. 4F and G). However, inverse immunoprecipitation of U2AF2 mutants revealed that U2AF2-RA does not strongly coprecipitate SF3B1 or phosphorylated Pol II (Fig. 4H) compared to wild-type U2AF2 and U2AF2-RD mutant, whereas in both cases, the U2AF2-dRS mutant demonstrated a clear reduction in the binding to SF3B1 and Pol II. These apparently contradictory results are in fact in agreement with our model where increased U2AF2 self-interaction when U2AF2 is unphosphorylated, increases the stoichiometry of U2AF2 over SF3B1 in complexes. According to this model (Fig. 4I and Supplementary Fig. 5), strong self-interactions between nonphosphorylated U2AF2 molecules increase the amount of coprecipitated U2AF2 upon SF3B1 precipitation. In contrast, the phosphorylation state of U2AF2 does not affect the total number of SF3B1 molecules that are coprecipitated with U2AF2 (Fig. 4H and I). The second nonexclusive hypothesis is that self-interaction of the hypophosphorylated mutant U2AF2-RA might lead to condensation on the surface on beads when it is attracted as a prey in a manner even more acute than it is the case for the wild-type U2AF2. Altogether, our sedimentation, pulldown, and co-immunoprecipitation results strongly suggest that the increased binding of nonphosphorylated U2AF2 to SF3B1 is indirectly due to a stronger self-interaction, and phosphorylation of the RS domain of U2AF2 regulates its interactions with its partners splicing factors, as well as its association with the Pol II CTD.
Comparison of the U2AF2 and U2AF2-dRS interactome using proximity labeling in vivo
Features of the U2AF2 interactome
To further explore the role of the RS domain of U2AF2 in shaping its interactome, we designed a proximity labeling experiment to capture constitutive and transient interactions of full-length U2AF2 and U2AF2 lacking RS domain in living cells. We used the BioID2 technique, which is based on the fusion of a protein of interest with a promiscuous bacterial biotin ligase (BirA) [77]. Expression of this recombinant protein in cells results in the stable biotinylation of primary amines on proximal proteins in a 10 nm radius generating a history of protein–protein associations that are detected by mass-spectrometry analysis of isolated biotin-labeled proteins [78] (Fig. 5A).
BioID2-based proximity labeling of U2AF2 and U2AF2-dRS interactomes. (A) Workflow of the BioID2 proximity labeling experiment. Stable HEK293 cell lines expressing BioID2 alone or BioID2-fused U2AF2 were selected using puromycin and incubated with biotin. The recombinant proteins catalyze the attachment of biotin to primary amines of proximal proteins in a 10 nm radius. Following cell lysis, biotinylated proteins were pulled-down using streptavidin beads, loaded on acrylamide gel, and analyzed using nano-LC-MS/MS. (B) Recombinant proteins used in the BioID2 proximity labeling experiment. Biotinylation enzyme BioID2 was fused to the N-terminus of the full-length U2AF2 (WT) or truncated U2AF2 lacking RS domain (dRS). BioID2 alone was used as a control. (C) Analysis of homogeneity of BioID2 samples before MS. Streptavidin pull-down of biotinylated proteins was performed in triplicate for all three conditions. One-tenth of protein samples eluted from streptavidin beads were resolved via SDS–PAGE and visualized using labeled streptavidin. (D) Volcano plot of U2AF2 interactors identified via BioID2 proximity MS. For filtered proteins obtained from MS, an interaction score and FDR were calculated and plotted (see the ‘Materials and methods’ section for details). In total, 286 proteins (FDR ≤ 0.05) were detected as putative U2AF2 interactors by comparing BioID2-U2AF2-WT and BioID2 alone. Venn diagram represents the overlap between U2AF2 interactors identified using SAINT analysis (SAINT FDR ≤ 0.05) and based on the custom FDR/interaction score described above. Green dots in the scatter plot indicate proteins identified in SAINT analysis. Red dots correspond to several representative known partners of U2AF2.
We expressed wild-type or RS-deleted U2AF2 fused to the biotin ligase in HEK293 cells and determined the content of biotinylated proteins by mass spectrometry whereas cells expressing only BioID2 were used as a control (Fig. 5B). Prior to mass-spectrometry analysis, the homogeneity of the samples in triplicate was verified in Western blot using labeled streptavidin (Fig. 5C).
Following the nano-LC-MS/MS analysis of samples, we filtered the raw list of spectral counts by removing known mass-spec contaminants in BioID assays [79] and included only proteins with a spectral count >3 in U2AF2-WT samples. For each protein, we calculated an interaction score as the spectral count difference between U2AF2-WT and control normalized to the molecular weight of the corresponding protein. This allowed to rank 286 proteins showing a statistically significant increase (FDR ≤ 0.05) in U2AF2-WT samples, and therefore representing the putative interactome of U2AF2 (Fig. 5D and Supplementary Table 1). We also analyzed our data using the SAINT (Significance Analysis of INTeractome) software that relies on probabilistic scoring of protein–protein interaction data generated using affinity-purification coupled to mass spectrometry (AP-MS) [59]. We observed an overlap of >95% with the list of 286 interactors obtained with our method (Fig. 5D). Importantly, known direct interactors of U2AF2, including SF3B1, SUGP1, SF1, U2AF1, and FUBP1, were among the top 50 hits in our list (Fig. 6A). In addition, our identified putative partners overlap with U2AF2 interactors previously detected in U2AF2-targeted affinity mass spectrometry [80], as well as with the list of putative U2AF2 interactors found in the protein-interaction databases BioGRID, IntAct, IMEx, InnateDB, and APID (Fig. 6B). Altogether, the accurate detection of known U2AF2 partners and the overlap with large-scale studies of U2AF2 interactors validated the robustness of our list of putative in vivo interactors for U2AF2.
Features of the U2AF2 BioID partners. (A) A list of the top U2AF2 BioID2 targets. Putative U2AF2 partners identified via MS (total N = 286) were sorted using the interaction score. Proteins in red correspond to the representative known partners of U2AF2 shown in Fig. 5D. (B) Overlap between known U2AF2 partners and interactors identified via BioID2 proximity MS. Venn diagrams show the overlap between the U2AF2 interactors identified in the current study and previous U2AF2-targeted affinity mass spectrometry analysis (upper panel) or putative U2AF2 partners found in the protein–protein interaction databases (lower panel). (C) Gene ontology analysis of identified U2AF2 partners. The list of 286 U2AF2 interactors was analyzed for GO enrichment using Panther. Selected nonredundant GO categories with significant FDR (< 10−5) are shown. Node colors represent fold enrichment (FE) and size corresponds to the protein number in the GO category. The full list of GO terms is shown in Supplementary Table 1. (D) Overlap between identified U2AF2 interactors and speckle proteins. Venn diagram shows the overlap between U2AF2 partners found in proximity labeling MS and speckle proteins previously detected in the SC35-targeted proximity labeling. (E) Enrichment of speckle proteins in the ranked list of U2AF2 interactors. Pre-ranked GSEA was performed using GSEA software. The list of identified U2AF2 BioID2 partners sorted by the interaction score and speckle proteins identified by Dopie et al. (2020) were used as inputs. Enrichment score (ES), normalized enrichment score (NES), and P-value are indicated. (F) Enrichment of splicing proteins in the ranked list of U2AF2 interactors. Sets of splicing proteins associated with specific spliceosome complexes were obtained from the Spliceosome database [61]. The enrichment of these splicing proteins in the pre-ranked score-based list of U2AF2 partners was performed using the GSEA software. Obtained normalized enrichment scores are shown as bars (upper part). Asterisks indicate the FDR < 0.05. A barcode plot (middle part) is used to visualize the location of the interactors in the list. The overall overlap (percent) between the U2AF2 partners and splicing proteins categories is shown as bars (lower part). (G) Overlap between identified U2AF2 interactors and RS domain-containing proteins. Venn diagram shows the overlap between U2AF2 partners from proximity labeling MS and known RS proteins. (H) Enrichment of RS proteins in the ranked list of U2AF2 interactors. The ranked list of identified U2AF2 partners sorted by the interaction score and a set of RS domain-containing proteins were used as inputs in the GSEA software. Enrichment score (ES), normalized enrichment score (NES), and P-value are indicated.
We next performed gene ontology analyses to characterize the localization, functions and structural features of the identified putative U2AF2 partners. As expected, these proteins mainly localize in the nucleus and, as it is the case for U2AF2, they are more likely to accumulate in nuclear speckles (Fig. 6C) [69, 81]. We further compared our list of putative interactors with two proximity labeling mass spectrometry studies of the speckles proteome. In a first study, the authors used tyramide signal amplification mass spectrometry to identify proteins close to SC35 (SRSF2) in U2OS cells, as this protein is highly enriched in speckles [63]. GSEA shows that speckles-associated proteins are enriched at the top of the ranked list of U2AF2 partners (Fig. 6D and E). In the second study, the authors identified speckles proteins by APEX2 proximity labeling with SRSF1, SRSF7, and RNPS1 in HEK293 cells [62]. Again, using GSEA, we observed that these speckles proteins are enriched in our list of U2AF2 putative partners (Supplementary Fig. 6A). Still a number of speckles proteins at the top of the speckles protein lists were not detected with U2AF2 BioID labeling such as TRA2B and SRSF9 (Supplementary Fig. 6B). Altogether these analyses reveal a significant enrichment of speckles proteins in the BioID list of putative partners of U2AF2 but our detection of U2AF2 putative binders is specific and not simply the result of a general labeling of speckles proteins.
Inversely, the list of putative U2AF2 partners was not restricted to speckles-enriched proteins, and for example it comprises a number of proteins known to concentrate in paraspeckles such as NONO, PSF, PSPC1, and FUS (Supplementary Table 1).
GO analysis also shows that identified U2AF2 partners mainly represent RNA-binding proteins involved in several steps of RNA processing and include mainly splicing factors and splicing regulators (Fig. 6C and Supplementary Table 1). Besides RNA splicing, an overrepresentation of proteins linked to transcription and RNA polymerase binding, chromatin remodeling, mRNA 3′-end processing, and RNA m6A methylation is detected. In addition the global analysis of interaction networks using the STRING database shows that U2AF2 putative partners involved respectively in splicing, transcription, mRNA 3′ end processing, and RNA methylation are tightly connected (Supplementary Fig. 7). These observations further support the link between splicing factors, in particular, U2AF2, and transcription [43], chromatin-binding [82], and 3′ end mRNA processing [45].
We then further analyzed the U2AF2 BioID hits for their association with the different spliceosome sub-complexes. Using the classification of splicing proteins proposed by Cvitkovic and Jurica [61], and GSEA, we observe, as expected, a strong association of U2AF2 with proteins related to the early spliceosome assembly (Fig. 6F). Indeed, proteins associated with the U2 snRNP complex and spliceosomal complex A are enriched at the top of the ranked list of U2AF2 partners. Still, some splicing factors associated specifically with later steps of spliceosome assembly were also part of the detected proteins, including several tri-snRNP specific proteins such as PRPF3, PRPF4, and SNRNP200.
Next, we analyzed the structural features of the identified partners that could mediate the interaction with U2AF2. In particular, ULM proteins comprising SF1, SF3B1, SUGP1, and SAP30BP are located at the top of the ranked list of the U2AF2 partners in line with their known direct binding to splicing factors UHM domains. To extend the list of known ULM proteins we performed a search for putative ULMs in identified U2AF2 partners using the consensus motif [R/K]_7_W[D/N][E/Q]. We detected fourteen novel putative ULMs in addition to the five known ULM-containing proteins present in BioID data (Supplementary Fig. 6C and D). The enrichment of putative ULM proteins at the top of the list suggests their direct interaction with U2AF2.
Besides UHM-ULM interactions, RS domains potentially contribute significantly to U2AF2 interactions or proximity with its partners. In agreement with this hypothesis, almost half of known RS-domain-containing proteins belong to the identified U2AF2 BioID targets (Fig. 6G) and show statistically significant enrichment in the top hits of U2AF2 partners (Fig. 6H) [28].
Identification and analysis of the RS-dependent interactions of U2AF2
Analysis of spectral counts for biotinylated proteins obtained with U2AF2-dRS revealed no significant additional interactions compared with wild-type U2AF2. We then compared spectral counts between samples expressing wild-type U2AF2 and U2AF2-dRS to identify putative RS-mediated interactions (Fig. 7A). As a control, U2AF1 was similarly biotinylated in cells expressing BioID2-fused U2AF2 or U2AF2-dRS, consistent with the U2AF2-U2AF1 interaction mainly relying on a strong binding between the ULM of U2AF2 and the UHM of U2AF1 [42]. In agreement with our co-precipitation experiments (Fig. 2B and C), the BioID2 analysis showed a different effect of the RS domain on U2AF2 interaction with SF3B1 and SF1: the biotinylation of SF3B1 was dependent on the RS domain of BioID-U2AF2, whereas biotinylation of SF1 did not. However, the vast majority of U2AF2 BioID2 targets displayed lower spectral counts in the U2AF2-dRS sample (Fig. 7A). Among them, 126 partners significantly lost their interaction with U2AF2 upon removal of its RS domain (FDR ≤ 0.05).
Features and validation of the RS-dependent BioID partners of U2AF2. (A) Volcano plot of proteins identified by proximity labeling MS showing difference between BioID2-U2AF2-WT versus BioID2-U2AF2-dRS. The identified U2AF2 interactors (total N = 286) were sorted by RS dependence score (the spectral count difference between BioID2-U2AF2-WT and BioID2-U2AF2-dRS divided by the spectral count in U2AF2-WT). FDR (the two-sample t-test with Benjamini–Hochberg correction) represents the significance of the difference. Red dots correspond to several representative known partners of U2AF2. (B) Enrichment of known and putative ULM-containing proteins among RS-dependent and RS-independent interactors of U2AF2. A pre-ranked list of U2AF2 partners sorted by the RS dependence score and a set of proteins with known or predicted ULMs were used as inputs in the GSEA software. Barcode plot: red lines, known ULMs; blue lines, putative ULMs. Enrichment score (ES), normalized enrichment score (NES), and P-value are indicated. (C) Enrichment of RS proteins among RS-dependent and RS-independent interactors of U2AF2. A pre-ranked list of U2AF2 partners sorted by the RS dependence score and a set of RS domain-containing proteins were used as inputs in the GSEA software. Enrichment score (ES), normalized enrichment score (NES), and P-value are indicated. (D) Enrichment of speckle proteins in the list of RS-dependent and RS-independent interactors of U2AF2. Pre-ranked GSEA was performed using GSEA software. A list of identified U2AF2 partners sorted by the RS dependence score and a set of speckle proteins from Barutcu et al. (2022), were used as inputs. Enrichment score (ES), normalized enrichment score (NES), and P-value are indicated. (E) The same as panel (D), but with a set of speckle proteins obtained from Dopie et al. (2020). (F) Validation of the differential interactions for U2AF2 and U2AF2-dRS. Immunoprecipitation of U2AF2-WT or U2AF2-dRS from extracts of WT or CRISPR-modified HEK293 cells (cell clones G2 and 33, Fig. 1B and C) was performed using U2AF2-specific antibodies in the presence of RNase. The coimmunoprecipitated proteins were detected by Western blot with the indicated antibodies. Coimmunoprecipitation for each of the indicated proteins was performed separately. Only representative blots are shown for U2AF2.
We used ranking of all U2AF2 partners based on the sensitivity to the removal of the RS domain to analyze structural and functional features of RS-dependent U2AF2 partners. We observed that RS-dependent U2AF2 putative interactors are not enriched in ULM motif-containing proteins (Fig. 7B) but highly enriched in RS-domain-containing proteins (Fig. 7C). In contrast, using the two speckles proteomes described above, we observed a slight or no enrichment of speckles proteins among RS-dependent interactors of U2AF2 (Fig. 7D and E). This markedly higher enrichment of RS domain proteins in RS-dependent partners of U2AF2 compared with that of speckles proteins (Fig. 7C versus D or E, normalized enrichment score of 3.96 versus 1.70 or 1.43) suggests that perturbations of U2AF2 interactions upon RS domain removal are not solely attributable to the partial mislocalization of U2AF2-dRS from speckles. Altogether, the presence of an RS domain in a U2AF2 BioID partner appears to be a strong predictor of this interaction being dependent on the RS domain of U2AF2, suggesting that U2AF2 interacts with multiple RS domain proteins through its RS domain.
GO analysis shows that RS-dependent and RS-independent partners of U2AF2 are present in different functional categories such as splicing, transcription and 3′ end processing, with a slightly higher enrichment of RS dependent partners among splicing and chromatin remodeling related proteins (Supplementary Fig. 8A). Further analysis of the different spliceosomal complexes shows that RS-dependent interactors of U2AF2 are found in all these complexes (Supplementary Fig. 8B and C). As expected, hnRNP proteins are less sensitive to the removal of the RS domain of U2AF2 than SR proteins (Supplementary Fig. 8C).
Validation of RS-dependent U2AF2 partners
To further validate the BioID approach and confirm whether U2AF2 partners are indeed differentially associated with U2AF2 and its RS-deleted mutant, we immunoprecipitated U2AF2 from wild-type HEK293 cells and U2AF2-dRS from CRISPR-modified HEK293 cells (deletion 24–65) and checked the coimmunoprecipitation of a set of 14 BioID partners in the presence of RNase to avoid any RNA-mediated interactions (Fig. 7F).
Interactions of U2AF2 with U2AF1 or SUGP1 that are known to be dependent on ULM-UHM contacts were detected and, as expected, we observed no reduction upon RS domain removal [35, 36, 83, 84]. We confirmed also the interaction of U2AF2 with the related cancer-associated splicing factor RBM39, in agreement with previous immunoprecipitation, FRET and proximity labeling assays [41, 85].
Importantly, we also confirmed most of the novel putative partners of U2AF2 from our BioID list that we tested. Indeed, U2AF2-related proteins PUF60 and SPF45 were coimmunoprecipitated, as well as the U2 snRNP-associated proteins CHERP and U2SURP, other complex A-related splicing regulators RBM10, DDX42, SRSF10, and CCAR1, and DNA damage-associated splicing proteins THRAP3 and BCLAF1. For all these interactions, their dependency on the RS domain of U2AF2 was confirmed in the coimmunoprecipitation experiments. However, not all putative interactions identified in BioID proximity labeling were confirmed in immunoprecipitation, as for example interaction with ZC3H18 (Fig. 7F). Such proteins might be in the proximity of U2AF2 without direct interaction, their interaction with U2AF2 could be less stable or require the presence of RNA or their binding might be sterically inhibited by the anti-U2AF2 antibody used (monoclonal MC3 antibody) [69].
RS domain mutations of U2AF2 impact splicing of a minigene
The U2AF2 RS domain has been shown to be required for U2AF2 function in splicing in vitro [20, 32].
To analyze the functional consequences of RS domain phosphorylation in cells, we tested the effect of U2AF2 mutants on splicing and exon inclusion. For this purpose, we used a bichromatic splicing reporter assay based on the mutually exclusive expression of different fluorescent proteins (GFP or RFP) depending on frameshift caused by exon inclusion [54]. This reporter, named RG6, is based on the well-characterized alternative splicing of exon 5 of chicken cardiac troponin T (Fig. 8A).
*Effects of RS domain mutants of U2AF2 on splicing of a minigene reporter. (A) Schematic representation of the RG6 splicing reporter minigene used for the analysis of alternative splicing. Exon inclusion induces the formation of the GFP-fused product, whereas exon skipping leads to the production of RFP-fused protein. Both protein products are localized in the nucleus due to an NLS (black bar). (B) Removal of the RS domain of U2AF2 affects exon inclusion. Normal and CRISPR-modified U2AF2-dRS HEK293 cells (cell clones G2 and 33; Fig. 1B and C) were grown in a 96-well plate and transfected with the minigene-encoding plasmid, fixed, and analyzed by HCS fluorescence microscopy. Left panel: Representative immunofluorescence images of cells expressing RG6 splicing reporter. Scale bar: 50 µm. Right panel: A plot representing the ratio of nuclear GFP to nuclear RFP in the indicated HEK293 cells expressing RG6 minigene. Calculated ratios are averaged for all cells in a single well of a 96-well plate; data from three wells per condition are shown. N= 73 000 cells. ***P <0.001, two-sample t-test. (C) Exon inclusion of RG6 minigene depends on the length and arginine residues in RS dipeptides of the RS domain of U2AF2. CRISPR-modified HEK293 cells expressing U2AF2-dRS (clone 33) were cotransfected with the RG6 minigene-encoding plasmid and one of the indicated Myc-tagged U2AF2 mutants. Cells were fixed and analyzed by HCS fluorescence microscopy. The plot shows the ratio of nuclear GFP to nuclear RFP in cells cotransfected with RG6 minigene and indicated U2AF2 mutants. Calculated ratios are averaged for all cells in a single well of a 96-well plate. Data from four wells per condition are shown. N = 330 000 cells. ns, nonsignificant, **P <0.01, ****P <0.0001, one-way ANOVA versus U2AF2 WT. (D) Representative immunofluorescence images of cells co-expressing RG6 splicing reporter and Myc-tagged U2AF2 with deletions in RS domain and R-to-A (AS) or R-to-K (KS) substitutions in RS dipeptides. Scale bar: 50 µm. (E) Mutants analysis supports an RS domain–phosphorylation-dependent role of U2AF2 in exon inclusion. CRISPR-modified HEK293 cells expressing U2AF2-dRS (clone 33) were cotransfected with the RG6 minigene-encoding plasmid and one of the indicated Myc-tagged U2AF2 mutants. Cells were fixed and analyzed by HCS fluorescence microscopy. A plot represents the ratio of nuclear GFP to nuclear RFP in cells cotransfected with RG6 minigene and indicated U2AF2 mutants. Calculated ratios are averaged for all cells in a single well of a 96-well plate; data from three wells per condition are shown. N = 220 000 cells. ns, nonsignificant, **P <0.01, ***P <0.001, ***P <0.0001 one-way ANOVA versus U2AF2 WT. (F) Representative immunofluorescence images of cells cotransfected with plasmids for expression of the RG6 splicing reporter and Myc-tagged U2AF2 with S-to-A (RA) or S-to-D (RD) substitution in RS dipeptides and R452D substitution in the UHM domain. Scale bar: 50 µm.
First, we compared RG6 pre-mRNA splicing in wild-type HEK293 cells and CRISPR-modified HEK293 cells stably expressing U2AF2-dRS. Mutant cells demonstrated higher RFP expression and therefore increased skipping of the alternative exon (Fig. 8B).
Next, to check the effect of U2AF2 RS domain mutants on splicing, we performed the cotransfection of RG6 construct with either of the U2AF2 mutants in U2AF2-dRS cells.
We observed that re-expression of a wild-type U2AF2 increases the GFP/RFP ratio, indicating that it restores the inclusion of the reporter exon (Fig. 8C and D). In contrast, expression of U2AF2-dRS mutant was not improving exon inclusion. Expression of mutants with partial deletion of the RS domain could rescue exon inclusion according to their RS domain length. Mutants harboring mutations of arginine in the RS dipeptides slightly improved exon inclusion but did not rescue it to the level observed with wild-type U2AF2. Therefore, the presence of the RS domain, its length and the arginine in the RS dipeptides were important for the splicing activity of U2AF2.
Re-expression of the phosphomimetic RD mutant was restoring exon inclusion as efficiently as the wild-type U2AF2, while the hypophosphorylated RA mutant had a much lower rescuing effect (Fig. 8E and F).
We then further explored the differential effect of U2AF2 RS domain mutants on splicing of endogenous cassette exons in HEK293 cells (Fig. 9A). On the basis of our RNA-seq data, we chose two cassette exons whose inclusion were dramatically dependent on the RS domain. Splicing index deduced from RNA-seq and qPCR was similar for these exons and qPCR confirmed the increased skipping of these exons in RS domain mutants clones (Fig. 9B). We then performed reexpression of myc-tagged wild-type U2AF2 in an RS domain-deleted clone (Fig. 9C and D) and observed a partial rescue of exon inclusion, likely partly because not all cells were successfully transfected. RS domain deletion and UHM domain mutations compromised this rescuing effect. The arginine to alanine mutant AS was slightly less efficient than wild-type U2AF2 while the KS mutant was as or more effective than wild-type U2AF2. The phosphomimetic RD mutant was indistinguishable from wild-type U2AF2 to restore exon inclusion, while the hypophosphorylated RA mutant was unable to rescue ARFIP2 exon 6 inclusion but was stronger than wild-type U2AF2 to rescue SLIT2 exon 9 inclusion. Altogether, the analyses of the reporter minigene and of endogenous exons indicate the importance of the RS domain and its basic residues in RS dipeptide and suggest that U2AF2 phosphorylation on its RS domain can positively or negatively regulate different cassette exons (Fig. 9E).
*Effects of RS domain mutants of U2AF2 on splicing of endogenous exons. (A) qPCR strategy used to design specific primers for amplification of “exon out” and “exon in” isoforms for endogenous genes. (B) The two isoforms corresponding to SLIT2 exon 9 and ARFIP2 exon 6 were amplified from two wild-type HEK293 clones and the two mutants with the large dRS2 deletion. Measurement of Splicing index (SI) based on the abundance of the “out” isoform and “in” isoform as SI = In/(In + Out) are presented. Quadruplicate measures are indicated by dots. (Bars: mean values, Errors bars: standard deviations.) (C) Wild-type U2AF2 or the indicated mutants were transfected in a clone with deletion of the RS domain (cl33). After 24 h, total RNA was prepared and splice isoforms quantified by qPCR. Quadruplicate values for each of two biological replicates are indicated by dots. Bars: mean values, Errors bars: standard deviations. P <0.05, ns – nonsignificant, unpaired t-test versus U2AF2 WT. (D) The expression of U2AF2 and the different mutants was checked in parallel by Western blotting of cell extracts with anti-myc antibody. (E) Model for the role of the RS domain of U2AF2 and its phosphorylation in the formation of the spliceosome. Homotypic interactions between RS domain proteins and self-interactions between RS domains of U2AF2 stabilize its association with spliceosome components, in particular, SF3B1. Removal of the RS domain partially disrupts the binding of U2AF2 to SF3B1 affecting splicing. Dephosphorylation of the RS domain of U2AF2 induces strong self-interactions impacting spliceosome dynamics and splicing outcomes.
The RS domain of U2AF2 is required for splicing genome-wide
To get genome-wide insight into the molecular function of the RS domain of U2AF2 in splicing in cells, we performed RNA-seq analyses on polyA RNA of three wild-type HEK293 clones, two mutants with the small dRS1 deletion and two mutants with the long dRS2 deletion. In parallel, we sequenced polyA RNA from wild-type HEK293 cells treated for 72 h with two validated shRNAs against U2AF2 [41] or nonsilencing shRNAs. We obtained about 50 million 150-base pair paired-end reads for each sample and analyzed splicing using the rMATS software. RNA-seq data showed the actual efficient knockdown of U2AF2 and confirmed the deletions of the RS domain in the corresponding clones (Supplementary Fig. 9). The knockdown of U2AF2 mainly altered cassette exon inclusion (Fig. 10A). For this reason, we focused on this type of splicing event and to strengthen the analyses of correlation of splicing changes with sequence features, we selected cassette exons showing no significant alternative 5′ss and alternative 3′ss (see the ‘Materials and methods’ section). We then further analyzed the splicing of cassette exons on the basis of coverage of exon-exon junctions. Using the splicing index for each cassette exon, we first clusterized the different samples. This showed a correct segregation of the different samples, with the closer relationship of duplicated samples, and the relationship of the short and long deletion samples. The close clustering of duplicated samples supported the robustness of the RNA-seq data (Fig. 10B). To reinforce the statistical power in the following analyses, we compared the four deletion mutants with the control samples. In parallel, we analyzed the nonsilenced and U2AF2-silenced samples. Both the deletion of the RS domain and U2AF2 knockdown mainly reduced cassette exons inclusion, with however a reduced amplitude of this effect for the RS domain deletion (Fig. 10C). In fact, we observed a clear correlation between splicing changes upon RS domain removal and U2AF2 knockdown, suggesting that the RS domain of U2AF2 is generally involved in splicing of U2AF2 targets (Fig. 10D).
Removal of the RS domain or knockdown of U2AF2 impacts splicing genome-wide. (A) rMATS detection of alternative splicing events affected in CRISPR-modified U2AF2-dRS HEK293 cells and in U2AF2 knockdown cells. RNA-seq analysis of the indicated cells was performed as described in the ‘Materials and methods’ section. The total number of alternative events detected in the corresponding samples and the number of significantly affected events are listed. SE, skipped exon; A5'SS, alternative 5′ splice site; A3'SS, alternative 3′ splice site; MXE, mutually exclusive exons; RI, retained intron. (B) Hierarchical clustering of the RNA-seq samples based on the cassette exons splicing. Working names of CRISPR-modified cell subclones and U2AF2-specific shRNAs are indicated. (C) Distribution of splicing changes (difference of splicing index in mutant or knocked down cells with control cells) for cassette exons of cells expressing U2AF2 with RS domain deletions (left panel) and cells with U2AF2 knockdown (right panel). Increased skipping in blue and increased inclusion in red. (D) Scatter plot showing the correlation between splicing changes of cassette exons upon deletion of U2AF2 RS domain and U2AF2 knockdown. (E) Schematic representation of splice site features calculated for cassette exons detected in RNA-seq analysis. L. intron, intron length; L. exon, exon length; AGez, AG exclusion zone length; 5′ss, 5′ splice site strength; bps, branch point site strength; ppt, polypyrimidine tract strength; 3′ss, 3′ splice site strength; feature index “1”, upstream intron; feature index “2”, downstream intron. (F) Correlations between splice site features and splicing changes of cassette exons following U2AF2 knockdown or RS domain removal.
To refine the comparison of the effects of U2AF2 knockdown and RS domain deletion on splicing, we searched for correlations of the splicing changes with splice site strength, AG exclusion zone length, and other gene structure parameters relative either to the intron upstream or downstream of the cassette exon (Fig. 10E). As expected, this revealed a stronger correlation of splicing changes upon U2AF2 knockdown or mutation, with splicing signals located at the 3′ end of the first intron. In agreement with previous observation, the length of the AG exclusion zone was positively correlated with the resistance of cassette exon inclusion to U2AF2 knockdown (Fig. 10F and Supplementary Fig. 10A) [41, 86]. In contrast, the removal of the RS domain had a stronger negative impact on the inclusion of cassette exons with a weak polypyrimidine tract just preceding the 3′ splice site of the first intron (positions −12 to −3 of the intron relative to the 3′ss) (Fig. 10F and Supplementary Fig. 10B). Therefore, cassette exons are differently sensitive to RS domain deletion or U2AF2 depletion depending on sequence features preceding the 3′ splice site. The distinct outcomes observed between U2AF2 knockdown and RS domain deletion suggest that the splicing alterations caused by RS domain removal are not solely due to reduced functionality or mislocalization and thus lower local concentrations of U2AF2-dRS, but also stem from specific disruptions in U2AF2’s interactions within the splicing machinery. Besides splicing signals, interestingly, a strong predictor of the effect of U2AF2 knockdown and even more of RS domain deletion was the length of the introns bordering the cassette exons (Fig. 10F and Supplementary Fig. 11). Indeed, exons bordered by shorter upstream or downstream introns, presented an increased skipping upon knockdown of U2AF2 or RS domain removal. Additionally, higher GC content in introns correlated with a higher impact of U2AF2 knockdown or RS domain depletion on cassette exon inclusion, although the correlation was less than with intron length (Fig. 10F).
As GC-rich genes with short introns have been shown to be enriched in inner nuclear regions where nuclear speckles are generally located [87–89], we were prompted to test whether the effect of U2AF2 on alternative splicing was linked to the distance of genes or transcripts from speckles. We integrated recently published datasets from five independent groups (Fig. 11A) [62, 90–93]. Analyses across all studies consistently showed that exon skipping induced by U2AF2 knockdown or RS domain deletion was maximal near nuclear speckles (Fig. 11B and Supplementary Figs 12 and 13). Further analyses confirmed that inclusion of exons flanked by short introns and located close to speckles is indeed more sensitive to U2AF2 depletion or RS domain deletion (Fig. 11C). We next addressed whether the proximity to speckles was specific to U2AF2-sensitive cassette exons or if it was a more general attribute of alternatively spliced exons. We used the set of cassette and constitutive exons that were detected on the basis of exon junction coverage in our HEK293 control RNA-seq data (Supplementary Table 2). We compared the speckle proximity score of genes containing one or several cassette exons with the scores of the genes without detected cassette exons. Using the five published different measures of distance to speckles, we observed that speckles proximity increased with the number of alternative exons in transcripts (Fig. 11D and Supplementary Fig. 14), as also supported by an elevated number of isoforms for speckles transcripts in K562 and HEK293 cells noticed by Henikof and colleagues [93] and in agreement with the reported increased proximity of alternatively spliced transcripts with MALAT1 long non-coding RNA (lncRNA) [94].
*Removal of the RS domain or knockdown of U2AF2 increases skipping of exons proximal to speckles. (A) The list of recent reports with measurements of the distance of genes or transcripts to speckles (hESC – human embryonic stem cells; HCT116 – human colorectal carcinoma cells; K562 – human chronic myelogenous leukemia cells; HFF – human fibroblast cells; HEK293 – human embryonic kidney cells; HeLa – human cervical adenocarcinoma cells; HepG2 – human hepatocellular carcinoma cells). (B) Correlations between the gene or transcript proximity to speckles and splicing changes of cassette exons in the corresponding genes upon U2AF2 knockdown or RS domain removal. Speckles proximity scores from five studies were used. In addition, as a control, a list of proximity scores to other nuclear structures (PML bodies, Cajal bodies, Sam68 bodies and lamin A) obtained from Barutcu et al. was also used. (C) Violin plots demonstrating the effect of U2AF2 knockdown (left panel) and RS domain deletion (right panel) on splicing changes of cassette exons flanked by short or long introns and localized close or far from nuclear speckles. ***P <0.001, ****P <0.0001 one-way ANOVA relative to short introns close to speckles. (D) Speckle proximity score of genes with different numbers of cassette exons. A set of genes (n = 5486) was obtained based on cassette exons and constitutive exons detected in RNA-seq. ns – nonsignificant, ** P <0.01, ***P <0.0001 one-way ANOVA versus genes without detected cassette exons. (Additional analyses confirming these correlations are presented in Supplementary Fig. 14). (E) Model for the role of the RS domain of U2AF2 in the regulation of alternative splicing of cassette exons located in speckle-proximal regions. RS domain of U2AF2 mediates its interaction with other RS proteins, maintaining a high concentration of active U2AF2 near speckles and controlling the inclusion of alternative exons. Partial loss of U2AF2, whether by RS domain removal or knockdown, perturbs alternative exons recognition, leading to increased skipping.
Discussion
RS domains are present in a number of splicing factors, such as SR proteins and SR-related proteins. They are mainly thought to mediate protein–protein interactions or to contact RNA for splice site recognition and spliceosome assembly. RS domains are also involved in the routing of splicing factors to speckles which are membrane-less nuclear structures thought to be a reservoir for the splicing machinery [95, 96] but where splicing has also been suggested to occur [97]. The observation that several RS domain proteins undergo LLPS in vitro suggests that RS domains mediate multivalent dynamic interactions and play a significant role in speckles formation [30, 41, 98]. Our sedimentation and, importantly, our pulldown experiments here further support the self-interaction of U2AF2 that depends on its RS domain and its phosphorylation state. Our analysis of speckle enrichment in a limited set of RS-domain U2AF2 mutants (Fig. 1 and Supplementary Figs 4 and 15) further suggests that LLPS capability can be linked to the enrichment of factors within nuclear speckles.
RS domains condensates and U2AF2 interactions
The capacity of SR proteins to form condensates due to the LLPS mechanism has also been related to their interaction with the CTD of Pol II, which was shown to concentrate in droplets of SRSF2 in vitro [98]. We previously observed that the RS domain of U2AF2 can also drive LLPS [41] and Manley and collaborators showed that the RS domain of U2AF2 was necessary for interaction with Pol II in pulldown assays [43]. Here we confirmed the RS domain dependency for interaction of U2AF2 with Pol II in coimmunoprecipitation experiments. This physical interaction with Pol II is thought to underlie at least partly the functional coupling of transcription and splicing. This coupling is also supported by the observation that splicing is mostly taking place cotranscriptionally [99]. Our results with phosphorylation mutants of U2AF2 suggest that the interaction with Pol II is reduced upon phosphorylation of the RS domain. As RS domain sedimentation is also reduced upon phosphorylation, our data support the model where U2AF2 multivalent self contacts favor an interaction with Pol II, that is regulated by phosphorylation of its RS domain.
The capability of mixed charge domains to form condensates was shown to require a minimal length [68]. Similarly, we observe that the capability of U2AF2 to interact with SF3B1 in vitro also increases with the length of its RS domain and depends on arginine residues, supporting our model that RS domain self-interactions are facilitating the binding of U2AF2 to SF3B1. In contrast with SF3B1, the RS domain dependency was not observed for SF1 in coimmunoprecipitation experiments with U2AF2. The differential requirement of the RS domain of U2AF2 for interaction with SF3B1 and SF1 is also supported by our BioID results. Indeed, SF1 is similarly biotinylated by U2AF2-dRS-BioID and wild-type U2AF2-BioID, while SF3B1 biotinylation is more efficient with the wild-type U2AF2-BioID. This different requirement of the RS domain for interaction of U2AF2 with SF3B1 and SF1 could be explained by the multiple ULM within SF3B1 compared to the single ULM of SF1. Indeed, we previously observed that the SF3B1 multi-ULM domain was enhancing the condensation of U2AF2 and cosedimented with U2AF2 in vitro, while SF1 could counteract this action of SF3B1 on U2AF2 condensation, suggesting also a possible role of U2AF2 self-interaction in the progression from early to A complex assembly [41]. In addition, several ULMs in SF3B1 are necessary for U2AF2 to enhance the recruitment to SF3B1 of the U2AF2-related RS domain-containing protein RBM39 [41]. Altogether, the present work supports the model that U2AF2 RS domain interaction with itself or other RS domain containing proteins favors its interaction with Pol II and U2 snRNP (Fig. 9E). However, we cannot exclude some contribution of direct contacts of the U2AF2 RS domain with these proteins, as suggested for SF3B1 by crosslinking studies [100] and by a faint interaction of the RS domain with SF3B1 that we observed in pulldown assays (Supplementary Fig. 2).
BioID reveals numerous putative protein interactions supported by the RS domain of U2AF2
RS domains are generally thought to be involved in protein–protein interactions. However, in their unphosphorylated form, these positively charged domains might drive nonspecific interactions with RNA. In addition, specific high affinity interactions of the RS domain of SRSF1 with G quadruplex have been reported, and U2AF2 was shown to bind such sequences with strong affinity [101]. Still, the most documented function of the RS domain of U2AF2 is to provide a contact with the branch site sequence of a model pre-mRNA in vitro and to stabilize the duplex formed by this sequence with the U2 snRNA [20]. Our data are in favor of an additional general implication of the RS domain of U2AF2 in protein–protein interactions, as we show that RS domain-containing proteins are highly enriched among U2AF2 BioID partners (Fig. 6H) and particularly among those whose proximity labeling depends on the presence of the RS domain (Fig. 7C). While some of these interactions might be indirectly due to the binding of U2AF2 to RNA, our coimmunoprecipitation experiments indicate that, in the absence of RNA, the RS domain of U2AF2 is indeed implicated in interactions with RS domain-containing proteins such as RBM39, SRSF10, CHERP, and U2SURP. Reduced proximity labeling could also partly originate from the slightly decreased localization of U2AF2-dRS in speckles as observed for overexpressed U2AF2-dRS in HeLa cells (Supplementary Fig. 4 and [41]) and for CRISPR deleted U2AF2-dRS in HEK293 cells (Fig. 1E). However, the confirmation of RS domain-dependent interactions in coimmunoprecipitation experiments (Fig. 7F), the different profiles of speckles proteins and RS-dependent U2AF2 BioID targets (Fig. 7D and E), and the strongest enrichment of RS domain proteins compared to speckles proteins in RS domain-dependent BioID partners (Fig. 7C–E) indicate that reduced labeling with BioID-U2AF2 upon RS domain removal generally corresponds to reduced interaction of U2AF2 with its partners. Therefore, we propose that the numerous RS dependent interactions we observe correspond mainly to specific functions of U2AF2 in RNA-related processes but also participate in its enrichment in speckles. According to the mixed charge domain model, homotypic interactions between RS domains in U2AF2 and its interactors are taking place. These interactions are mediated by arginine residues, whereas phosphorylation of serine residues changes the local charge and regulates RS-mediated contacts.
Of note, our BioID data were in good agreement with known interactions and functions of U2AF2: (i) most known partners of U2AF2 were among the top hits in the BioID data. (ii) There is a large overlap of our BioID hits with the interactome that was observed by coimmunoprecipitation by Whisenant et al., and the list of interactors in PPI databases. (iii) Gene ontology analysis of the BioID hits list shows a strong enrichment of RNA splicing-related proteins. (iv) Gene enrichment analysis further indicates a strong enrichment of U2 snRNP-related proteins among BioID hits.
Multiple splicing factors and splicing regulators of variable structure and functions were significantly enriched among the identified U2AF2 putative interactors. However, the top hits were mainly represented by the spliceosomal proteins that are recruited to the spliceosomal complex A and are associated with the U2 snRNP complex confirming the crucial role of U2AF2 in the recruitment of this complex to the branch point site. Moreover, most of the interactions between U2AF2 and components of the complex A are affected by the removal of the RS domain, as demonstrated in BioID and coimmunoprecipitation experiments, suggesting that both UHM and RS domains of U2AF2 are important for the assembly of complex A.
Altogether, our coIP results and overlap of our list of BioID targets for U2AF2 with well-known interactors and large-scale interaction datasets support the relevance of our BioID results. Further investigation for each novel substrate should give interesting novel insights into U2AF2 functions.
U2AF2 and the transcription machinery
Besides splicing factors, the BioID data suggest that U2AF2 indeed interacts with the transcription machinery. Although the polymerase II subunits were not directly detected, a number of CTD-interacting proteins were among the BioID hits, including CTD-interacting domain CHERP, RPRD2, SCAF proteins, and components of the PAF1 complex, CDC73, PAF1, and RTF1, as well as other transcription elongation regulators (i.e. TCERG) and histone-interacting factors including SPT and SETD proteins. These results are in line with previous mass-spectrometry data showing the putative association of U2AF2 with PAF complex [82] and histone methyltransferase SETD2 [102]. Moreover, some CTD-interacting proteins could stabilize the coupling of U2AF2 and Pol II in an RS-dependent way since RS-deleted U2AF2 losеs interaction with some of them, in particular, CTD Interaction Domain-containing proteins CHERP, U2SURP, and PCF11. Overall, these novel data further support the role of U2AF2 in the functional coupling of transcription and splicing.
U2AF2 and 3′ end processing factors
Previous data support a role of U2AF2 in the coupling of splicing and 3′ end processing of vertebrate pre-mRNAs. In particular, a CTD of poly(A) polymerase (PAP) was shown to interact with U2AF2 and to enhance U2AF2 binding to the 3′ splice site [44]. The other way, U2AF2 was also shown to stimulate pre-mRNA 3′ end processing and to interact with an RS-like domain of the 59 kDa subunit of the human cleavage factor I (CF Im), CPSF7 [45]. Both poly(A) polymerase and CPSF7 are among our BioID hits for U2AF2. In addition, the gene ontology analysis shows a general enrichment of 3′ end processing factors. Indeed, besides CPSF7 and PAP, a number of other 3′ end processing factors were detected in our BioID experiments, such as CPSF6, WDR33, and NUTD21. Interestingly, these factors were higher in the ranked list of BioID hits compared to PAP and CPSF7. As interaction of CPSF6 with U2AF2 N-terminal domain was not detected in previous GST-fusion interaction experiments, it is likely that other regions of this protein participate in its interaction with U2AF2 [45]. Based on our BioID results, we propose that interactions of U2AF2 with a greater number of factors than previously reported contribute to the coordination of the splicing and 3′ end processing machineries. Of note, the dependency on U2AF2 RS domain for its interaction with cleavage factor CPSF7 [45] was not confirmed in our BioID experiments and we did not see enrichment of 3′ end processing factors among RS-dependent partners of U2AF2, suggesting that other mechanisms are involved in these interactions in a living cell.
Regulation of U2AF2 interactions by phosphorylation of its RS domain
We observed that U2AF2 is phosphorylated in human cells or when coexpressed with SRPK1 in bacteria (Fig. 4). RS domain deletion or mutations of serine residues in the RS dipeptides of this domain reduced U2AF2 phosphorylation, suggesting that these serine residues are indeed phosphorylated in vivo, as also indicated by large-scale phosphoproteomics data. Our pulldown and sedimentation data then suggest that homotypic interactions of U2AF2 are negatively regulated by this phosphorylation in the RS domain. Indeed, the solubility of recombinant U2AF2 was improved by phosphorylation by SRPK1 (Fig. 3), while the hypophosphorylated U2AF2-RA mutant, with serine to alanine mutations in the RS dipeptides, was the less soluble (Fig. 4). This mutant was also the best binder of GST-U2AF2 in pulldown assay. Interactions with SF3B1 and Pol II CTD were reduced upon phosphorylation of U2AF2 RS domain in pulldown assays. Immunoprecipitation results were also in agreement with a model where dephosphorylation favors U2AF2 self-interaction and multimerization on the SF3B1 surface. A similar observation was made for Pol II. Altogether, we identify U2AF2 phosphorylation as a potential regulation mechanism to control its multimerization and interactions with partner proteins. We suggest that phosphorylation of the RS domain increases the dynamics of the RS-mediated homotypic interactions and interactions with partners, thereby modulating recognition of the splice site (Fig. 9E).
U2AF2 RS domain importance for splicing
Using a splicing reporter, we observed that U2AF2 function to promote exon inclusion was compromised by the removal of the RS domain, as expected if the lack of the RS domain affects the proper recruitment of SF3B1 and U2snRNP. A mimetic phosphorylated mutant was as efficient as wild-type U2AF2 to promote exon inclusion. In contrast, the hypophosphorylated RA mutant was unable to restore exon inclusion, suggesting that this mutation, by impairing the dissociation of the U2AF2-SF3B1 complexes, perturbs the recognition of this cassette exon. RS domain removal or hypophosphorylation is also expected to perturb U2AF2 interaction with Pol II, as observed in coIP experiments. This perturbation of the dynamics of U2AF2 binding to Pol II could also contribute to the reduced exon inclusion by these mutants by altering the cotranscriptional recognition of splice sites. Indeed, the RG6 splicing reporter used in our study has already been shown to be cotranscriptionally regulated [103]. The importance of the RS domain for exon inclusion could be confirmed for endogenous genes that we tested in similar rescue experiments. The detrimental effect of RS domain deletion was similar to the UHM mutation. Overall, although based on a limited number of U2AF2 mutants, the splicing rescue results suggest a positive three-way correlation between exon inclusion efficiency, binding to SF3B1, and enrichment in nuclear speckles, suggesting that an optimal U2AF2 RS-domain interaction capability is beneficial for both speckles localization and splicing activity (Fig. 9E and Supplementary Fig. 15). Notably, the hypophosphorylated RA mutant behaves as an outlier, exhibiting a defective localization in speckles, but very strong interactions with wild-type U2AF2 and SF3B1, as well as strong sedimentation. Additional aberrant interactions of this mutant might contribute to its reduced localization in speckles. This mutant enhanced the inclusion of SLIT2 exon9 but not that of ARFIP2 exon6, suggesting also that U2AF2 phosphorylation can regulate splicing in an exon-specific manner. It is likely that endogenous U2AF2 is phosphorylated to different extents in cells, which might contribute to the complexity of the splicing machinery and alternative splicing.
RNA-seq analyses confirmed that U2AF2 depletion generally promotes exon skipping. A similar effect was observed for RS domain deletion. However, the profile of affected exons presented clear differences. In particular, RS domain removal had a stronger impact on exon inclusion for cassette exons having lower pyrimidine content next to the 3′ splice site, while a longer AG exclusion zone correlated with an enhanced resistance of cassette exon inclusion to U2AF2 knockdown. These results suggest that the U2AF2 RS domain facilitates the recognition of the 3′ splice site with a long AG exclusion zone, in agreement with our previously proposed model that U2AF2 multimerization might facilitate the recognition of extended polypyrimidine-rich regions preceding the 3′ splice site [41].
We also observed that longer introns upstream or downstream of the cassette exons make these exons less sensitive to U2AF2 knockdown or RS domain deletion, which suggests that the function of U2AF2 in splice site recognition is also directed by higher-order gene structure features, in a manner that depends on its RS domain. Similar correlations between intron length and splicing outcomes were also shown for some other RS-containing splicing regulators, including some of our BioID U2AF2 partners. In particular, exons downregulated upon depletion of CHERP are flanked by shorter introns with high GC content and weaker PPT [104], cassette exons skipped upon SRRM2 deficiency are also flanked by shorter introns [105], and SON-regulated introns are shorter, have weaker splice site strength and have higher GC content [106]. Therefore, U2AF2 and additional factors are in limiting amount for the inclusion of exons flanked by short introns.
One possibility is that longer upstream introns allow better recruitment of U2AF2 to polymerase II and longer downstream introns give a kinetic advantage to cassette exons inclusion by giving more time for recognition of the acceptor site of the upstream intron before the acceptor site of the downstream intron is transcribed. This would make U2AF2 concentration less critical for exon inclusion. A second hypothesis is that U2AF2 together with RS-dependent partners are in limiting amount for the intron definition mechanism and not exon definition, as these mechanisms are associated with GC-rich exons bordered by short GC-rich introns (leveled architecture) and exon flanked by longer GC-poor introns (differential architecture), respectively [89, 107].
Interestingly, recent data suggest that short GC-rich retained introns are associated with speckles in the central region of the nucleus [62, 89]. Using measurements of speckles proximity based on five different technical approaches, we observed that alternative splicing events that were the most affected by U2AF2 knockdown or deletion of its RS domain were proximal to speckles. U2AF2 and other splicing factors are enriched in speckles and splicing is known to be particularly active near speckles [92, 108]. Still, our data show that efficient splicing in these regions does not suffer U2AF2 concentration to be reduced. This feature of U2AF2 might be shared with other RS domain proteins such as PRPF40A, SON, and SRRM2 that have also been proposed to promote cassette exon inclusion in the core region of the nucleus versus peripheral regions [109]. Interestingly, such factors are part of the RS domain-dependent U2AF2 BioID partners that we identified here. We propose that despite enrichment of the splicing machinery and the higher splicing efficiency near speckles, this efficiency of splicing near speckles is still limiting to accommodate the high splicing load of short intron genes in this region. This is consistent with recent findings showing the enrichment near or within nuclear speckles of unexcised introns [62, 91] and of alternatively spliced transcripts [93, 94], that we confirm in the five datasets we have analyzed. Altogether, the model is emerging now that unspliced or difficult to splice transcripts are targeted to speckles. Our data that speckles transcripts are more sensitive to U2AF2 depletion further sustain this model.
Mechanistically, we propose that the RS domain of U2AF2, through its multiple interactions with partner proteins that often contain RS domains, contributes to its recruitment to speckles-near regions. Transcripts containing short introns may be enriched near speckles for several reasons: a higher frequency of association with splicing factors, a higher load of splicing factors relative to their size, or, because short introns are generally GC-rich, preferential association of GC-rich sequences with speckles. This higher splicing frequency and the high splicing factors load resulting also from posttranscriptional slow splicing near speckles, likely contribute to the enrichment of splicing factors, short intron-containing transcripts, and incompletely spliced transcripts. Together, the higher concentrations of splicing factors and unspliced transcripts may promote speckles formation through condensation-related mechanisms, but still result in splicing activity being limiting near speckles (Fig. 11E). Finally, our results suggest that U2AF2 activity can be modulated by phosphorylation of serine residues within RS dipeptides.
Supplementary Material
gkag143_Supplemental_Files
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Papasaikas P, Valcárcel J. The Spliceosome: the ultimate RNA chaperone and sculptor. Trends Biochem Sci. 2016;41:33–45. 10.1016/j.tibs.2015.11.003.26682498 · doi ↗ · pubmed ↗
- 2Ritchie DB, Schellenberg MJ, Mac Millan AM. Spliceosome structure: piece by piece. Biochim Biophys Acta. 2009;1789:624–33. 10.1016/j.bbagrm.2009.08.010.19733268 · doi ↗ · pubmed ↗
- 3Wahl MC, Will CL, Lührmann R. The spliceosome: design principles of a dynamic RNP machine. Cell. 2009;136:701–18. 10.1016/j.cell.2009.02.009.19239890 · doi ↗ · pubmed ↗
- 4Wilkinson ME, Charenton C, Nagai K. RNA Splicing by the Spliceosome. Annu Rev Biochem. 2020;89:359–88. 10.1146/annurev-biochem-091719-064225.31794245 · doi ↗ · pubmed ↗
- 5Will CL, Lührmann R. Spliceosome structure and function. Cold Spring Harb Perspect Biol. 2011;3:a 003707. 10.1101/cshperspect.a 003707 .21441581 PMC 3119917 · doi ↗ · pubmed ↗
- 6Zamore P, Green M. Identification, purification, and biochemical characterization of U 2 small nuclear ribonucleoprotein auxiliary factor. Proc Natl Acad Sci USA. 1989;86:9243–7. 10.1073/pnas.86.23.9243.2531895 PMC 298470 · doi ↗ · pubmed ↗
- 7Merendino L, Guth S, Bilbao D et al. Inhibition of msl-2 splicing by Sex-lethal reveals interaction between U 2AF 35 and the 3’ splice site AG. Nature. 1999;402:838–41. 10.1038/45602.10617208 · doi ↗ · pubmed ↗
- 8Wu S, Romfo C, Nilsen T et al. Functional recognition of the 3’ splice site AG by the splicing factor U 2AF 35. Nature. 1999;402:832–5. 10.1038/45590.10617206 · doi ↗ · pubmed ↗
