Alternative splicing factor RAB3IP as a novel risk signature to predict the prognosis of colorectal cancer
Zhengwei Zhou, Fei Gao, Han Lei, Haixuan Wen, Jiaxi Tang, Yulong Peng, Lili Fan, Lu Xu, Guang Shu

TL;DR
This study identifies RAB3IP, an alternative splicing factor, as a new biomarker for predicting the prognosis of colorectal cancer.
Contribution
The study introduces RAB3IP as a novel independent prognostic marker for colorectal cancer based on splicing factor analysis.
Findings
RAB3IP shows a significant positive correlation with favorable prognosis in colorectal cancer.
RAB3IP is linked to key cancer-related pathways like Cell Cycle, WNT, and Spliceosome.
Four splicing factors, including RAB3IP, were identified as the most predictive markers for CRC prognosis.
Abstract
Emerging evidence suggests that aberrant alternative splicing plays a vital role in the development of tumors. However, the expression of splicing factors (SF) in colorectal cancer and its relationship with prognosis is still unclear. Here, we divided patients into high-risk and low-risk groups through univariate COX analysis and LASSO regression analysis, and selected 13 alternative splicing factors that are highly correlated with prognosis for subsequent analysis. We systematically analyzed the prognostic value of transcription levels of SFs in colorectal cancer (CRC) and found that RAB3A interacting protein (RAB3IP), programmed cell death 4 (PDCD4), golgin B1 (GOLGB1), and neuregulin 4 (NRG4) as the most predictive markers for the prognosis of CRC. After comparing the expression of four splicing factors in cancer tissues with normal tissues as well as OS analysis, it is strongly…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA Research and Splicing · RNA modifications and cancer · RNA and protein synthesis mechanisms
Introduction
Alternative splicing (AS) is a process necessary for gene transcription into mature mRNA of different variants, and it is also one of the main reasons for protein diversity. AS is essential for tissue development, differentiation and homeostasis, and is a key cellular pathway in higher eukaryotes 1. Whole-genome studies estimate that 90% to 95% of human genes have varying degrees of alternative splicing 2, 3. Most splicing factors are RNA-binding proteins (RBPs) that, in combination with other elements, promote or inhibit splicing site recognition during tissue development and cell differentiation 4. Dysregulation of alternative splicing has been found to be involved in the development of various cancers, where splicing is largely de-regulated due to mutations or abnormal expression of key splicing regulatory proteins 5. Based on the analysis of the cancer genome atlas (TCGA) data of 32 types of cancers in 8705 patients, the occurrence of alternative splicing events in tumors is more than 30% higher than that of normal tissues 6, mainly affecting physiological activities such as cell cycle control, cytoskeletal organization, migration and cell-cell adhesion, etc. More importantly, cancer-specific splice variants contribute to tumor cell survival and cancer progression and may serve as prognostic biomarkers for predicting patient survival 2, 6-9.
Colorectal cancer kills nearly 700,000 people each year, a high mortality rate that makes it the fourth deadliest cancer in the world (after lung, liver, and stomach cancer) 10-12. Increasing evidence indicates have shown that alternative splicing also plays a pivotal role in CRC initiation and progression, with aberrant AS closely linked to colorectal carcinogenesis 11, 13-16. Among them, SRSF6 can directly bind to the exon 23 motif of ZO-1 to perform abnormal splicing and promote the proliferation and metastasis of CRC 17. HNRNPLL, as a newly discovered colorectal cancer metastasis inhibitor, regulate the alternative splicing of CD44 during the epithelial-mesenchymal transition, thereby inhibiting the metastasis of CRC 13. In addition, alternative splicing induced by PHF5A hyperacetylation can stabilize KDM3A mRNA and promote its protein expression, resulting in a poor prognosis for CRC patients 18. Sam68, an RNA-binding protein, affects cell growth and glycolysis by regulating the alternative splicing and expression of PKM2 in CRC 19. Furthermore, epigenetic modifications can achieve tumor control by regulating alternative splicing 20.
Accumulating evidence indicates that dysregulation of alternative splicing (AS) directly facilitates oncogene activation and enhances neoplastic cell proliferation and metastatic potential in colorectal cancer 21, 22. Recent studies have established AS events as independent prognostic biomarkers in CRC, which has prompted our focused investigation into the mechanistic role of splicing factors in CRC pathogenesis 23. Although there is a large amount of evidence that AS has a direct or indirect role in the occurrence and development of CRC, further research is needed to clarify the specific AS event regulation mechanism and macro-prognosis application. In this study, we drew the alternative splicing map in CRC by analyzing RNA-seq data and assessed the risk ratio of AS events with COX analysis. The gene RAB3IP, which has the strongest correlation with CRC, was noticed through Lasso regression analysis. Functional experiments such as proliferation and migration proved that this gene plays a vital role in the occurrence of CRC and can be used for independent prognostic analysis. Meanwhile, our study provides new strategies for discovering novel biomarkers for prognosis prediction and targets for treatment development.
Materials and Methods
AS-related gene datasets and clinical date
In this study, RNA-seq data and complete clinical follow-up records (overall survival (OS) ≥ 30 days) for the colorectal cancer (CRC) cohort were obtained from the TCGA SpliceSeq database (https://bioinformatics.mdanderson.org/TCGASpliceSeq/). CRC samples with incomplete clinical or molecular profiles in the TCGA dataset were stringently excluded, yielding a final analytical cohort of 447 rigorously curated CRC cases. Eligible patients included in this article are in accordance with the following inclusion criteria: (1) pathological confirmation of the diagnosis; (2) no prior history of neoadjuvant therapy or other malignancies; (3) complete clinicopathological data. The detailed clinic parameters of enrolled patients were presented in Table. Exclusion criteria included the following: (1) other treatments were used after the operation; (2) patients with incomplete survival data; (3) other organ tumors. The SpliceSeq tool was used to analyze the AS profile and evaluate the splicing pattern of mRNA in CRC patients. According to the PSI value of alternative splicing (Percent-spliced-in) (0-1), delete unnecessary data, and finally quantify seven AS events: exon skip (ES), mutual Exclusion exon (ME), reserved intron (RI), alternate promoter (AP), alternate terminator (AT), alternate donor site (AD) and alternate acceptor site (AA). Finally, the R language (version 3.4.0) is used for statistical analysis, and the UpSet diagram is used to visualize the various combinations of the intersection of the seven AS types, and clearly show the quantitative results of multiple AS event interaction sets.
Selection of key genes in AS events and analysis of differential expression
In the AS event, genes with Z-score>2() were identified as candidate genes, and the volcano map was drawn. At the same time, the Lasso regression curve was used to prevent the model from overfitting (λ< -3.3). Perform univariate Cox regression analysis on the expression in the TCGA data set to evaluate the prognostic value of variable splicing candidate genes, select 13 candidate genes that are actually related to survival (P <0.1), and conduct the next functional study. Then, four candidate genes and their coefficients are determined by the minimum standard, and the best penalty parameter λ related to the TGGA data set is selected. Use the following formula to calculate the risk score of the signature 24:
Where Coefi is the coefficient, and x_i_ is the relative expression value of each selected regulator after z-score conversion. This formula is used to calculate the risk score of each patient in the TGGA data set. In CRC cases, high-risk subtypes (samples with a risk score higher than 5) and low-risk subtypes (samples with a risk score lower than 5) are defined based on the risk score of their tumor samples. Based on the risk score, ROC analysis is used to test whether survival predictions are sensitive and specific.
Subsistence analysis
This study included 447 patients with overall survival (CRC) of at least 30 days. According to the median of each parameter, the patients were divided into two groups, and the AS event data and survival data were combined for single-factor COX analysis. In addition, the Kaplan-Meier curve is drawn by comparing the OS data of CRC patients within 5 years. The Chi-square test is used to compare the difference in survival status between high-risk and low-risk. Then, use receiver operating characteristics (ROC) analysis to compare the efficiency of the survival ROC software package in R for each predictive model.
Independent prognostic analysis and survival analysis of NRG4, GOLGB1, PDCD4 and RAB3IP
Univariate and multivariate Cox regression analyses were used to obtain the prognostic value of the risk score generated by the multivariate model (P<0.05). Demographic and clinical information (including age, grade, number of nodules, etc.) are used for model calibration. Biomarker Exploration of Solid Tumors (BEST) portal (https://rookieutopia.com/app_direct/BEST/) was used for validation the association between the expression of RAB3IP, PDCD4, GOLGB1 and NRG4 and tumor progression. The mRNA expression level and survival prognosis of the four genes in the cancer group and the normal group in colorectal cancer were analyzed through the UALCAN website(http://ualcan.path.uab.edu/index.html). COPTAP is used to study the differences in protein function of candidate genes, a website dedicated to the study of protein expression levels.
Gene set enrichment analysis (GSEA) of RAB3IP
The GSEA 3.0 software with genome c2 (cp.kegg.v.6.2.symbols.gmt) was used to perform the prognostic-related gene set enrichment analysis (GSEA) of MeDEG. The mRNA expression level of 447 colorectal cancer patients in the TCGA database is used as a dataset. The number and type of arrangement are set to "1000" and "phenotype" respectively. Enrichment scores> 0.4 and P <0.05 were considered statistically significant.
Cell culture and transfection
The normal human colon mucosal epithelial cell line NCM460 and human colorectal cell lines HCT116 and SW480 used in this study were purchased from the American Type Culture Collection (ATCC;). The colorectal cell lines HCT8, Caco2, and HT29 were kindly provided by Professor Wancai Yang (Key Laboratory of Precision Oncology of Shandong Higher Education, Institute of Precision Medicine, Jining Medical University). The NCM460, SW480, HCT8, HCT116, Caco2, and HT29 cells were maintained in RPMI-1640 (Biological industries) with 10% FBS. All cells were cultured at 37°C with 5% CO_2_.
RNA extraction and quantitative reverse transcription polymerase chain reaction (RT-PCR)
Total RNA was isolated by Trizol reagent (Vazyme) and performed reverse transcription PCR with Go Script reverse transcription system (Promega). On the ABI Prism 700 thermal cycler, real-time qPCR was performed using GoTaq qPCR Master Mix (Promega). The following is the primer sequence: RAB3IP (forward primer: AAGCTGAAGTAGCTGCATTGAA; reserved primer GCCACTCATAGCACTGCT TGT); GAPDH (forward primer: CTGGGCTAC ACTGAGCACC; reserved primer: AAGTGGTCGTTGAGGGCAATG). GAPDH was used as an internal control.
Wound-healing assay
Inoculate 2×10^5^ SW480 and HCT8 cells in a 6-well plate, and when they grow to logarithmic phase and close to 90% confluence, draw a straight line vertically in the center of the plate with a pipette tip. After washing three times with PBS, transfect the RAB3IP overexpression plasmid. Choose 0h, 24h, 48h three-time points to take pictures, and finally use the Image J software to calculate the distance of cell migration (Cultivate with the serum-free medium during the experiment).
Statistical analysis
One-way analysis of variance (ANOVA) was used to analyze the expression levels of colorectal cancer patients in the high-risk group and the low-risk group (TCGA data set), and multivariate analysis of variance was performed to determine the grade, age and survival status of patients with CRC expression levels. ANOVA was used to analyze the characteristic risk scores and clinical or molecular pathological characteristics of colorectal cancer. The ROC curve was used to test the prediction efficiency of the risk signature. The Kaplan-Meier method was used for OS in low-risk and high-risk groups. Chi-Square test was used to conduct statistical analysis on the relationships between the mRNA expression of SFs and the clinicopathological characteristics of patients with CRC. All statistical analyses were performed using Rv3.4.1 (https://www.r-project.org/) and Prism 8 (GraphPad Software Inc.) software.
Results
Development of a risk signature by alternative splicing profiles in CRC
After analyzing the raw data used in CRC, we observed ESs in 1637genes, ATs in 1198 genes, APs in 552 genes, RIs in 310 genes, AAs in 220 genes and ADs in 207 genes. Among all AS events, ES is the most common, whereas ME is the least frequent. Notably, a single gene may undergo multiple splicing modes; therefore, we visualized all AS events using an UpSet diagram (Figure S1A).
To investigate the association between AS events and CRC survival, we conducted univariate Cox regression analysis and identified survival-related alternative splicing factors (Figure 1A). At the same time, the Upset chart presented AS events related to survival (Figure S1B). In order to better predict the clinical outcome of CRC with alternatively spliced genes, we used the least absolute shrinkage and selection operator (LASSO) Cox regression algorithm for 2309 SFs in the TCGA data set, and obtained the risk score through R packages LASSO regression analysis (Figure 1B, C). Using the median risk score (median risk score = 1.09) as the cut-off point, we divided all patients into a high-risk group and a low-risk group, and carefully studied the significant difference in OS between the two groups (P <0.05; Figure 1D). Receiver operating characteristics (ROC) analysis is used to test whether survival predictions are sensitive and specific based on risk scores. The calculation of the area under the curve (AUC) value is based on the ROC curve (Figure 1E). The ROC curve shows that the supporting result of our risk model is AUC = 0.944.
Verification of a risk signature consisting of four SFs
We further investigated the prognostic value of a single alternative splicing factor in CRC. We performed univariate Cox regression analysis on the expression level of each alternative splicing factor in different alternative splicing events in the TCGA data set. The results showed that among the 13 alternative splicing factors with the smallest P-value, 4 SFs were significantly related to OS (Table 1) (P<0.05; Figure 2A; Figure S2), and these four alternative splicing factors were highly expressed in high-risk patients (Figure 2B). Then, we analyzed the distribution of risk scores and survival status (Figure 2C, D). In Figure 2C, the risk score of each patient is ranked from low to high, and the patients are divided into low-risk groups (green dots) and high-risk groups (red dots) according to the median risk score. High-risk patients demonstrated significantly increased mortality (58.3% vs. 22.7%, P<0.001) with earlier median survival time (32 vs. 68 months, log-rank P<0.001) compared to low-risk counterparts (Figure 2D).
Independent prognostic analysis
In order to verify whether the clinicopathological characteristics (including age, gender, stage, depth of tumor invasion, local lymph node enlargement, and distant metastasis) are independent prognostic factors for the patient's prognosis, we conducted a univariate and multivariate analysis of OS. Univariate analysis using the Cox proportional hazards model for all variables showed that risk score (P <0.001, 95% CI HR 1.022-1.044) can be used as an independent factor for poor prognosis of CRC patients (Figure 3A). The same multivariate analysis as the cohort univariate analysis supports that risk score (P = 0.001, 95%CI HR 1.014-1.037) and age (P = 0.024, 95%CI HR 1.004-1.054) are independent factors for the poor prognosis of CRC patients, but there is no obvious correlation between TNM stage, tumor invasion depth, regional lymph node metastasis and distant metastasis. To further identified the correlation between the alternative splicing factors of RAB3IP, NRG4, PDCD4 and GOLGB1 and tumor progression, the expression of four SFs in CRC tissue was analyzed using BEST. Consistent with our earlier result from the TCGA dataset, the expression of RAB3IP, NRG4 and GOLGB1 showed higher level in CRC tissue in GEO (Figure 3C-E). Surprisingly, a negative correlation was observed between the mRNA expression levels of PDCD4 and tumor progression as shown in Figure 3F. In addition, we analyzed the expression profiles of RAB3IP, PDCD4, GOLGB1 and NRG4 in Normal and Primary tumors in the TCGA database on the UALCAN website(http://ualcan.path.uab.edu), The results showed that RAB3IP was highly expressed in tumors, and NRG4 and PDCD were downregulated in tumors. There was no significant difference in GOLGB1 (Figure 4A-D). At the same time, we compared the OS of patients with high expression levels of these four genes with those with low expression levels and found that the high expression of RAB3IP, NRG4, PDCD4 and GOLGB1 had poor prognostic effects (Figure 4E-H).
Furthermore, we expanded our analysis by including 447 CRC patient samples from the TCGA database to investigate the correlations between the four SF genes and the pathological characteristics of CRC (Table 2). The results revealed significant associations between the expression of RAB3IP and NRG4 and various clinicopathological parameters in CRC patients. We observed that high levels of RAB3IP expression were significantly associated with the Lymph node metastasis of CRC patients. The expression of NRG4, PDCD4 and GOLGB1 had no obvious relevance of CRC. Additionally, high expression levels of RAB3IP were significantly associated Overall survival. These findings suggest that RAB3IP may serve as a potential biomarker for CRC, indicating its diagnostic relevance. Based on the concordant evidence of tumor-specific overexpression and strong survival association, RAB3IP was prioritized for functional characterization (Figure 4A, E).
The function of RAB3IP in regulating migration of CRC cells
In order to clarify the important function of RAB3IP in CRC, we downloaded its protein expression in normal and primary tumors from the CPTAC database, and the results are consistent with the mRNA expression profile (Figure 5A). Additionally, tissue microarray data from the Human Protein Atlas database indicated that RAB3IP is upregulated in tumor tissues, suggesting its potential role in tumor progression (Figure 5B). Next, we used gene enrichment analysis (GSEA) and found that RAB3IP is related to Cell Cycle, WNT pathway and Spliceosome (Figure 5C). To further clarify the role of RAB3IP in colorectal cancer, first, we examined the expression of RAB3IP in a variety of colorectal cancer cells. Among them, SW480 and HCT8 show relatively low levels of RAB3IP transcripts (Figure S3A). We studied the biology of RAB3IP in CRC pathology by overexpressing RAB3IP cDNA in those two cell lines and conducted Wound-Healing Assays. Our results indicated that after overexpression of RAB3IP, the migration speed of SW480 was increased compared with the control by 27% (Figure 5D, E). The enhancement trend was remained in HCT8 cell (Figure S3B).
Discussion
Aberrant mRNA splicing is recognized as a critical driver of carcinogenesis and tumor progression 25, 26. Although AS events have been characterized in numerous genes, their functional significance remains poorly understood in the global landscape of alternative splicing. With the advancement of high-throughput sequencing technologies and bioinformatics data analysis methods, we have more resources to analyze the effects of different AS events on the occurrence and development of cancer.
Colorectal cancer (CRC), ranking as the fourth leading cause of cancer-related mortality worldwide, demonstrates strong associations with dysregulated alternative splicing 17, 18. Therefore, it is particularly important to analyze the AS events in CRC deeply and clarify the macroscopic prognostic criteria of splicing factors in AS events in the development of tumors 27. In this study, we determined the AS events in colorectal cancer transcriptome through the overall analysis of RNA-seq data and clinical data in the TCGA database and found that ESs in 1637 genes, ATs in 1198 genes, APs in 552genes, RIs in 310, AAs in 220 genes and ADs in 207 genes. Additionally, there are many genes that have more than one alternative splicing variant. Then, we first selected four (RAB3IP, PDCD4, GOLGB1 and NRG4) and prognostic-related alternative splicing factors through Cox univariate analysis and LASSO Cox regression analysis, and conducted risk and post-analysis. Next, we used univariate analysis and multivariate analysis to evaluate the prognostic value of these four splicing factors and found that all were positively correlated with poor prognosis, aligning with multiple previous research outputs 28-34. It is worth noting that this is not consistent with the fact that most of the survival-related AS events in ovarian cancer are favorable prognostic factors 35. In addition, there is reported that PDCD4 is transcriptionally repressed by the alternative splicing factor SRSF3, which promotes the metastasis of colorectal cancer 36, 37. Breast cancer exosomes contribute to pre-metastatic niche formation and promote bone metastasis of tumor cells via the miR21-PDCD4 axis 33. Meanwhile, it has also been found that EMT-Exos-derived exosomal miR-106b can inhibit the expression of PDCD4 in macrophages, induce M2 macrophage polarization by activating PI3Kg/AKT/mTOR pathway, and promote the malignant progression of CRC cells 38. NRG4 has been identified and characterized as a novel variable shear factor in prostate cancer, but its specific role remains unclear 29. In 2014, GOLGB1 was reported to enhance susceptibility to galectin-1 in prostate cancer, thereby inducing cellular apoptosis 31. Recent research also demonstrates that GOLGB1 plays a critical role in the risk of pulmonary metastasis among breast cancer patients 28. For RAB3IP, there are much evidence demonstrate that it can promote cell proliferation and metastasis in many cancers, but its role as an alternative splicing factor in CRC remains unclear 30, 32.
Then, we analyzed the mRNA and protein expression of the four genes in normal tissues and tumors and found that only RAB3IP was significantly upregulated in tumors, further supporting its correlation with tumor occurrence and development in CRC. Finally, through gene chip analysis and GSEA analysis, it revealed the enrichment of RAB3IP in Cell Cycle, WNT, and Spliceosome pathways. All the technical routes are shown in Figure 6. Other studies have shown that RAB3IP is highly expressed in gastric cancer and induces the migration and invasion of gastric cancer cells through the EMT pathway 39, 40. But whether it has the same effect on CRC remains to be further studied.
In conclusion, this study not only provides a comprehensive characterization of AS factor expression and prognostic value in CRC but also establishes a framework for AS-related cancer research. It also provides new biomarkers for predicting the prognosis of CRC, and may even provide new targets for treatment development.
Conclusion
In conclusion, our study highlights the clinical significance and prognostic value of AS events in CRC. Moreover, we further identified RAB3IP as an alternative splicing factor affecting CRC progression by affecting Cell Cycle, WNT, and Spliceosome pathways. These findings not only help to construct an association between AS events and CRC metastasis, but also provide novel targets for future anti-metastasis therapies.
Supplementary Material
Supplementary figures.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Tao Y Zhang Q Wang H Yang X Mu H Alternative splicing and related RNA binding proteins in human health and disease Signal Transduct Target Ther 20249263830246110.1038/s 41392-024-01734-2PMC 10835012 · doi ↗ · pubmed ↗
- 2Danan-Gotthold M Golan-Gerstl R Eisenberg E Meir K Karni R Levanon EY Identification of recurrent regulated alternative splicing events across human solid tumors Nucleic Acids Res 2015435130442590878610.1093/nar/gkv 210PMC 4446417 · doi ↗ · pubmed ↗
- 3Wang ET Sandberg R Luo S Khrebtukova I Zhang L Mayr C Alternative isoform regulation in human tissue transcriptomes Nature 200845647061897877210.1038/nature 07509 PMC 2593745 · doi ↗ · pubmed ↗
- 4Baralle FE Giudice J Alternative splicing as a regulator of development and tissue identity Nat Rev Mol Cell Biol 201718437512848870010.1038/nrm.2017.27PMC 6839889 · doi ↗ · pubmed ↗
- 5Wedler A Bley N GlaßMMüller S Rausch A Lederer MRAVER 1 hinders lethal EMT and modulates mi R/RISC activity by the control of alternative splicing Nucleic Acids Res 2024523971883830078710.1093/nar/gkae 046PMC 11039986 · doi ↗ · pubmed ↗
- 6Kahles A Lehmann KV Toussaint NCHüser M Stark SG Sachsenberg T Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients Cancer Cell 20183421124.e 63007874710.1016/j.ccell.2018.07.001PMC 9844097 · doi ↗ · pubmed ↗
- 7Luo C Cheng Y Liu Y Chen L Liu L Wei NSRSF 2 Regulates Alternative Splicing to Drive Hepatocellular Carcinoma Development Cancer Res 2017771168782808240410.1158/0008-5472.CAN-16-1919 · doi ↗ · pubmed ↗
- 8Wang F Pan J Liu Y Meng Q Lv P Qu F Alternative splicing of the androgen receptor in polycystic ovary syndrome Proc Natl Acad Sci U S A 2015112474382582571610.1073/pnas.1418216112 PMC 4403157 · doi ↗ · pubmed ↗
