Identification of marker genes for spinal cord injury
Zhiwei Luan, Jiayu Zhang, Yansong Wang

TL;DR
This study identifies four genes linked to spinal cord injury, which could help develop new treatments for this disabling condition.
Contribution
The study identifies and experimentally validates four novel marker genes (Icam1, Ch25h, Plaur, Tm4sf1) associated with spinal cord injury.
Findings
Four genes (Icam1, Ch25h, Plaur, Tm4sf1) were identified as differentially expressed in spinal cord injury.
The identified genes were validated using PCR and immunohistochemistry in SCI rat models.
The study links SCI with immune cell involvement through gene expression analysis.
Abstract
Spinal cord injury (SCI) is a profoundly disabling and devastating neurological condition, significantly impacting patients’ quality of life. It imposes unbearable psychological and economic pressure on both patients and their families, as well as placing a heavy burden on society. In this study, we integrated datasets GSE5296 and GSE47681 as training groups, analyzed gene variances between sham group and SCI group mice, and conducted Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis based on the differentially expressed genes. Subsequently, we performed Weighted Gene Correlation Network Analysis (WGCNA) and Lasso regression analyses. We identified four characteristic disease genes: Icam1, Ch25h, Plaur and Tm4sf1. We examined the relationship between SCI and immune cells, and validated the expression of the identified…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7| Gene | Forward premier | Reverse premier |
|---|---|---|
|
| TCTTCCTCGGCCTTCCCATA | AGGTACCATGGCCCCAAATG |
|
| GACCTTCCGTGGTCAACTCA | GGAGATCATGATGCGGTGCT |
|
| TTCCATTCCACAATGTGCTT | GGCCAGTGGAACTACACCTT |
|
| ACAACAACGACACCTTCC | AGTACAGCAGGAGACATCA |
- —Natural Science Foundation of China10.13039/501100001809
- —Key Project of Natural Science Foundation of Heilongjiang Province of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeroxisome Proliferator-Activated Receptors · Spinal Cord Injury Research · Neurogenetic and Muscular Disorders Research
Introduction
1
Spinal cord injury (SCI) commonly occurs due to various incidents including car accidents, falls, sports-related injuries, earthquakes, among others. The global prevalence of SCI has risen from 236 cases per million people to 1,298 cases. Current estimates place the annual number of SCI cases worldwide between 250,000 and 500,000, with over one million SCI patients in China (1–3). SCI imposes a substantial psychological burden on patients and places heavy strains on families and society. Regeneration and repair of SCI represent some of the most daunting medical challenges, yet effective methods to promote neurological function recovery are lacking. Currently available approaches can only offer supportive relief for individuals who are disabled for life (4–6). Hence, there is an urgent need to develop a safe and efficient treatment for spinal cord injury.
The emergence and advancement of bioinformatics provide new avenues for investigating the pathogenesis and therapeutic targets of SCI. Over the past decade, the continuous progression of gene chips and high-throughput sequencing technology, combined with the extensive use of machine learning and artificial intelligence, have led to the evolution of novel bioinformatics fields, which include biomarker discovery, prediction of drug targets, and research on protein function (7). Whole genome analysis technologies such as high-throughput sequencing and gene chips have been widely adopted globally in the last decade to study gene expression patterns, resulting in the generation of vast amounts of data (8). This has driven the development of data processing and analysis methodology, and has given rise to algorithms or models such as protein–protein interaction (PPI) network, weighted gene correlation network analysis (WGCNA), and receiver operating characteristic curve (ROC), all providing valuable insights for disease mechanism investigation and biomarker discovery (9–11).
In this study, we used WGCNA and LASSO regression analysis to identify four marker genes for SCI, which were subsequently validated through PCR and immunohistochemistry experiments using SCI rats. These findings may offer assistance in future SCI diagnosis and treatment.
Method
2
Data download and processing
2.1
We utilized the Gene Expression Omnibus (GEO),1 a publicly available functional genomics database with diverse gene expression datasets (12). We downloaded datasets GSE5296 and GSE47681 as the training group, and GSE132242 as the validation group. Subsequently, we used Perl software and related code to organize this information into a matrix file comprising sample names and mouse gene names. Following this, we used Perl software and relevant code to segregate mRNA and lncRNA from the gene matrix file, thereby obtaining the lncRNA gene matrix file of SCI. The version of R we use for data analysis is 4.2.0.
Differential genes and functional enrichment
2.2
The “Limma” R package (3.52.0) was employed to identify differentially expressed genes (DEGs) in SCI and healthy mice, using a significance threshold of p < 0.05. We further utilized the “clusterprofiler” R package (4.4.1) to conduct Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis on the selected DEGs (13).
WGCNA and LASSO regression
2.3
The R package “WGCNA” (1.72.1) was utilized to perform WGCNA analysis based on different subtypes of CRGs content. Outlier samples were excluded, and the remaining samples were clustered to construct topological overlap matrices and scale-free networks. Utilizing a gene module correlation >0.8 as the filtering criterion, we identified the gene module displaying the strongest clinical correlation. Subsequently, genes were organized within this module to facilitate the subsequent step of core gene screening. An SCI diagnostic model was developed based on LASSO regression coefficients, and the model’s performance was evaluated using ROC curve and area under the curve (AUC). Finally, the model was validated with dataset GSE132242.
Correlation between SCI and immune cells
2.4
In order to further investigate the impact of DEG expression on immune cell infiltration, we conducted a single sample gene set enrichment analysis (ssGSEA) using the R language GSVA package (1.44.0). This enabled us to obtain the gene set based on the ssGSEA algorithm for each sample. Combining data analysis of immune cells and immune function, we utilized the R language pheatmap software package to create heatmaps of the dataset, immune cells, and immune function.
Animal experiment
2.5
We divided 10 SD rats into two groups equally: the control group and the SCI group. Following anesthesia of the SCI group rats, we removed dorsal hair and made a 1.5 cm incision at the T10 segment of the spinal cord to expose the rat spinal cord. We then used weights dropped from a height to impact the rat spinal cord. One week later, we euthanized the rats and collected their spinal cord tissue for further analysis. This experiment was approved by the Ethics Committee of the First Affiliated Hospital of Harbin Medical University (Ethics approval number: 2023045).
Quantitative PCR assay and immunohistochemistry
2.6
To further investigate the variations in risk genes between SCI-affected tissues and normal tissues, we conducted quantitative real-time reverse transcription polymerase chain reaction (RT-PCR) analysis and immunohistochemistry. The experimental protocol for PCR and immunohistochemistry was performed as previously described (14). The relevant primers for PCR are provided in Table 1.
Information on antibodies used in immunohistochemistry: Ch25h (Abcam, ab214295), Plaur (Abcam, ab307895), Tm4sf1(Sigma-Aldrich, SAB1307024).
Result
3
Differential gene expression and enrichment analysis
3.1
To investigate the differential gene expression between healthy mice and those with SCI, we conducted heat maps and volcano plots of DEGs. Our analysis revealed a significant increase in DEG expression in the SCI group compared to the control group, while no decrease in DEG expression was observed (Figure 1). Subsequently, we performed GO enrichment analysis and KEGG enrichment analysis. The biological processes (BP) involving DEGs were notably enriched in response to molecular stimuli of bacterial origin and response to lipopolysaccharide. Furthermore, the molecular functions (MF) of DEGs were significantly enriched in processes such as secretory granule membrane and specific granule (Figure 2C). The cellular composition (CC) of DEGs was significantly enriched in processes such as chemokine activity and chemokine receptor binding (Figure 3A). KEGG enrichment analysis showed that DEGs are associated with pathways such as myeloid leukemia activation and cell chemotaxis (Figure 3B). Notably, GSEA analysis highlighted differences between the control group and the SCI group, with the former mainly enriched in the calcium signaling pathway and oxidative physiology, while the SCI group exhibited enrichment in neutrophil chemotaxis and neutrophil migration, indicative of the crucial role of immunity in SCI (Figures 3C–F).
Expression of differentially expressed genes. (A) Heat map of DEGs. Blue represents samples from the control group, while red represents samples from the SCI group. The vertical axis is the name of the differentially expressed gene. Red indicates upregulation of gene expression, while blue indicates downregulation of gene expression. (B) Volcano map of DEGs. Green dots represent genes with no differences, while black dots represent upregulated differentially expressed genes.
Verification of SCI characteristic genes. (A) Verify the expression of Icam1, Ch25h, Plaur and Tm4sf1 in rat tissues using PCR. (B) Verify the expression of Ch25h, Plaur and Tm4sf1 in rat tissues using immunohistochemical.
Enrichment analysis of DEGs. (A,B) GO enrichment analysis and KEGG enrichment analysis. The vertical axis represents the name of the function/pathway, and the horizontal axis represents the number of genes enriched in each function/pathway. The color of the column represents the significance of enrichment, and the redder the color, the more significant the enrichment of differential genes in that function/pathway. (C–F) GSEA enrichment analysis. The horizontal axis represents sorted genes, while the vertical axis represents enriched scores. Different colored curves represent different functions/pathways. The peak of the curve appearing on the upper left side indicates that the function/pathway is active in the SCI group, while the peak of the curve appearing on the lower right side indicates that the function/pathway is active in the control group.
Identification of SCI characteristic genes
3.2
Utilizing WGCNA as a method for discovering genes and clinical phenotypes, we explored key genes related to clinical disease phenotypes by integrating DEGs. Following analysis of the positive correlation coefficient, we identified the module with the strongest correlation coefficient. Our results revealed that the blue module [r = 0.57, P = (1e−12)] displayed the most significant genetic significance. Subsequently, we extracted genes from the blue module, which are considered key genes related to the clinical phenotype (Figures 4A,B). Then LASSO regression analysis was performed to identify 4 characteristic genes associated with SCI: Icam1, Ch25h, Plaur and Tm4sf1 (Figures 4C,D).
Identification of SCI characteristic genes. (A) Gene clustering tree diagram. Cluster genes, where different genes belong to different color modules. (B) Heat map of the correlation between modules and clinical characteristics. In each grid, the upper value represents the correlation coefficient, and the lower value represents the p-value of the correlation test. The correlation coefficients and p-values of different modules screened by WGCNA. The depth of module colors represents the degree of correlation. (C) Cross-verified graphics. The abscissa represents the logλ value, and the ordinate represents the error of the cross-validation. The number of points with the smallest cross-validation error corresponds to the number of disease-characteristic genes. (D) LASSO regression graph.
Expression of SCI characteristic genes in different groups
3.3
We then analyzed the expression of the four disease characteristic genes in both the control group and the SCI group of mice. Our findings revealed upregulation of Icam1, Ch25h, Plaur and Tm4sf1 in the SCI group (Figures 5A–D). Moreover, we used dataset GSE132242 as the validation group for analysis, confirming increased expression of these four genes in the SCI group (Figures 5E–H).
Expression of SCI characteristic genes in different groups. (A–D) Differential boxplot of training group. The horizontal axis represents the type of sample, which is divided into control group samples and SCI group samples. The control group is represented in blue, while the SCI group is represented in red. The vertical axis represents the expression level of characteristic genes. (E–H) Differential boxplot of the validation group. The difference in the characteristic genes of this disease between the control group and the SCI group (p-value <0.05). And the box plot of the SCI group is higher than that of the control group, indicating that the gene is upregulated in the SCI group.
ROC curve of SCI characteristic genes
3.4
We conducted ROC curve analysis to validate the SCI characteristic genes. Our findings revealed AUC values of 0.877, 0.891, 0.851 and 0.845 for the four genes in the training group, indicating relatively high accuracy of the disease characteristic genes we screened (Figures 6A–D). Subsequently, we utilized the validation group to develop ROC curves, further confirming the accuracy of our model (Figures 6E–H).
ROC curve of SCI characteristic genes. (A–D) ROC curves of four characteristic genes in the training group. The horizontal axis represents the false positive rate, and the vertical axis represents the true positive rate. The larger the area under the curve, the higher the accuracy of the gene as a disease diagnostic gene. (E) ROC curves of Icam1, (F) Ch25h, (G) Tm4sf1, and (H) Plaur in the validation group.
Immunological correlation analysis of SCI characteristic genes
3.5
Recognizing the vital role of immunity in SCI progression, we analyzed differences in immune cells between the SCI group and the control group. Our analysis revealed higher expression of most immune cells in the SCI group compared to the control group, with only a few immune cells showing higher expression in the Control group than in the SCI group, such as activated B cells and effector memory CD8 T cells (Figures 7A,B). Additionally, several immune cells are associated with SCI characteristic genes, including regulatory T cells, natural killer cells, and mast cells (Figure 7C).
Immunological correlation analysis of SCI characteristic genes. (A) Heat map of immune cells. Green represents the control group sample, and red represents the SCI group sample. The vertical axis represents various immune cells. Blue represents low expression, while red represents high expression. (B) Violin diagram of immune cell differences. The horizontal axis represents the name of immune cells, and the vertical axis represents the content of immune cells. If p < 0.05, it indicates that there is a difference in the immune cells between the control group and the SCI group. (C) Correlation analysis between SCI characteristic genes and immune cells. The vertical axis represents the name of immune cells, and the horizontal axis represents the characteristic genes of the disease.
Validation of SCI characteristic genes
3.6
Finally, we validated the previously screened genes through LASSO regression using PCR and immunohistochemistry experiments. We observed higher expression of Icam1, Ch25h, Plaur and Tm4sf1 in the SCI group rats compared to the control group (Figures 2A,B).
Discussion
4
In this study, we initially compared gene expression between mice in the SCI group and those in the control group by utilizing gene heatmaps and volcano plots. Subsequently, we performed GO enrichment analysis and KEGG enrichment analysis on the differentially expressed genes. We identified four characteristic SCI genes through WGCNA and LASSO regression, and verified the model’s accuracy using ROC curves. Furthermore, we found that multiple immune cells are associated with these characteristic genes, implying their potential role in immune regulation. Finally, we validated the characteristic SCI genes through PCR and immunohistochemical experiments.
SCI stands as one of the most debilitating and detrimental central nervous system afflictions, capable of causing spinal cord vascular rupture, rapid neural cell demise, and impaired movement, sensation, and autonomic nervous system function below the affected segment (4, 15). Over the past 30 years, the global incidence rate has surged from 236 cases per million people to 1,298 cases, escalating annually (16, 17). SCI can be categorized into non-traumatic and traumatic SCI based on its causes. Non-traumatic SCI primarily arises from acute and chronic conditions like tumors and intervertebral disc herniation (18, 19), while traumatic SCI is predominantly triggered by traffic accidents, sports-related falls, or violent acts, leading to spinal fractures, dislocations, and subsequent cord compression or rupture (20, 21). SCI profoundly impacts patients’ quality of life, mental and physical well-being, and imposes significant economic burdens on families and society. Consequently, addressing SCI holds substantial societal importance.
Icam-1, a cell membrane surface glycoprotein weighing 90 kDa, was initially found to interact with leukocyte function-associated antigen-1, facilitating the adhesion of white blood cells or tumor cells to endothelial cells. This interaction contributes to inflammatory immune responses or tumor metastasis (22–24). Additionally, Icam-1 can exist in the extracellular matrix as a soluble factor, regulating inflammation and immunity within the body. Exogenous Icam-1 may originate from tumor cell membrane detachment or be secreted into the extracellular matrix in vesicle form (25, 26). Notably, the association between Icam-1 and gastric cancer metastasis has been confirmed through numerous clinical and pathological tissue studies (27).
Ch25h, an interferon-stimulated gene, encodes a product that catalyzes cholesterol oxidation, producing soluble 25-hydroxycholesterol, thereby playing a crucial regulatory role in cholesterol metabolism homeostasis (28). Apart from its pivotal function in cellular metabolism, Ch25h’s role in the body’s immunity has gained increasing attention in recent years (29, 30). Research indicates that TLR ligands and interferon stimulation can induce high Ch25h expression in macrophages and dendritic cells. Ch25h can inhibit IgA generation in B lymphocytes and affect intracellular bacteria growth (31). As an immune transmitter, Ch25h can induce inflammatory factor expression; thus, dysregulated Ch25h can lead to immune pathological changes in the body (32).
Plaur is a multifunctional receptor on the cell surface that can be expressed on various immune active cells such as neutrophils and activated T lymphocytes. In addition, it can also be expressed on endothelial cells and some tumor cells (33). There have been studies on the relationship between Plaur and lung injury in respiratory diseases, and some observations suggest that injury may be related to immune mechanisms (34). Plaur plays an important role in the fibrinolytic system and is also related to the progression of tumors. It can promote the proliferation of tumor cells and facilitate their migration between different regions (35).
The protein encoded by the Tm4sf1 gene is a member of the transmembrane 4 superfamily, mediating signal transduction events regulating cell development, activation, growth, and movement (36). Tm4sf1 is highly expressed in human pancreatic cancer, breast cancer, lung cancer, and other tumors, closely linked to tumor cell growth, migration, and invasion (37–40). It has also been reported to have high expression in human tumor endothelial cells. Reducing Tm4sf1 levels can inhibit cell migration, block cell division, promote cell aging, and inhibit VEGF-A-induced vascular maturation (41). Our data indicates a significant increase in the expression of Icam1, Ch25h, Plaur and Tm4sf1 in the rat SCI group.
In this study, we used data set GSE132242 as the verification group to verify that the expressions of Icam1, Ch25h, Plaur and Tm4sf1 in the SCI group were higher than those in the Control group. This is the first time that dataset GSE132242 has appeared in a related article in bioinformatics analysis. We found four SCI characteristic genes. As a characteristic gene of SCI, Plaur has not been thoroughly studied. We believe that Plaur may be helpful for the identification and treatment of SCI.
In summary, we identified four pathogenic genes related to SCI using the WGCNA model and LASSO regression, validating them through PCR and immunohistochemical. Our findings may offer new insights into SCI-related diseases.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.
Ethics statement
The animal study was approved by the Ethics Committee of the First Affiliated Hospital of Harbin Medical University (Ethics approval number: 2023045). The study was conducted in accordance with the local legislation and institutional requirements.
Author contributions
ZL: Data curation, Investigation, Software, Writing – original draft. JZ: Formal analysis, Project administration, Validation, Writing – original draft. YW: Funding acquisition, Resources, Writing – review & editing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Liu Y Yang X He Z Li J Li Y Wu Y. Spinal cord injury: global burden from 1990 to 2019 and projections up to 2030 using Bayesian age-period-cohort analysis. Front Neurol. (2023) 14:1304153. doi: 10.3389/fneur.2023.1304153, PMID: 38116113 PMC 10729761 · doi ↗ · pubmed ↗
- 2Wang R Bai J. Pharmacological interventions targeting the microcirculation following traumatic spinal cord injury. Neural Regen Res. (2024) 19:35–42. doi: 10.4103/1673-5374.375304, PMID: 37488841 PMC 10479866 · doi ↗ · pubmed ↗
- 3Fan B Wei Z Yao X Shi G Cheng X Zhou X. Microenvironment imbalance of spinal cord injury. Cell Transplant. (2018) 27:853–66. doi: 10.1177/0963689718755778, PMID: 29871522 PMC 6050904 · doi ↗ · pubmed ↗
- 4Anjum A Yazid MD Fauzi Daud M Idris JAMHN Selvi Naicker A. Spinal cord injury: pathophysiology, multimolecular interactions, and underlying recovery mechanisms. Int J Mol Sci. (2020) 21:7533. doi: 10.3390/ijms 21207533, PMID: 33066029 PMC 7589539 · doi ↗ · pubmed ↗
- 5Guo X Jiang C Chen Z Wang X Hong F Hao D. Regulation of the JAK/STAT signaling pathway in spinal cord injury: an updated review. Front Immunol. (2023) 14:1276445. doi: 10.3389/fimmu.2023.1276445, PMID: 38022526 PMC 10663250 · doi ↗ · pubmed ↗
- 6Yin P Liang W Han B Yang Y Sun D Qu X. Hydrogel and nanomedicine-based multimodal therapeutic strategies for spinal cord injury. Small Methods. (2023) 8:e 2301173. doi: 10.1002/smtd.20230117337884459 · doi ↗ · pubmed ↗
- 7Crichton DJ Mattmann CA Thornquist M Anton K Hughes JS. Bioinformatics: biomarkers of early detection. Cancer Biomark. (2010) 9:511–30. doi: 10.3233/CBM-2011-018022112493 PMC 13015973 · doi ↗ · pubmed ↗
- 8Liu L Guo K Liang Z Li F Wang H. Identification of candidate genes that may contribute to the metastasis of prostate cancer by bioinformatics analysis. Oncol Lett. (2018) 15:1220–8. doi: 10.3892/ol.2017.740429399176 PMC 5772834 · doi ↗ · pubmed ↗
