Single-Cell Analysis Reveals Epithelial Heterogeneity and Tumor Microenvironment Characteristics During the Malignant Progression of Colorectal Cancer
Qianqian Chen, Yaoqian Yuan, Shuai Tian, Jiayan Zhou, Kunming Lv, Enqiang Linghu

TL;DR
This study uses single-cell sequencing to explore epithelial cell diversity and tumor microenvironment changes in colorectal cancer progression.
Contribution
Identifies 11 epithelial cell subtypes and their roles in CRC progression, highlighting KCNMA1+ and MKI67+ cells as key indicators.
Findings
Epithelial cells were divided into 11 subgroups marked by genes like MKI67 and KCNMA1.
Downregulation of KCNMA1 and upregulation of MKI67 correlate with poor CRC prognosis.
Cell–cell communication suggests bidirectional regulation between epithelial and fibroblast subsets.
Abstract
Background/Objectives: To mine single-cell sequencing data for colorectal cancer (CRC), identify CRC epithelial cell subtypes, and explore the heterogeneity of epithelial cells and their impact on the tumor microenvironment (TME). Methods: The GSE201348 dataset, including normal, colorectal adenoma, high-grade colorectal intraepithelial neoplasia, and CRC tumor tissue samples, was downloaded from the Gene Expression Omnibus. The Seurat package of R software was used for data quality control, data integration, normalization, and clustering. The Feature Plot and the Recode function were executed to annotate and group the epithelial cells. Finally, genetic differences, copy number variant heterogeneity, pseudotime, cell–cell communication, and Gene Set Variation Analysis (GSVA) were further conducted. Results: In total, 26,335 gene matrices from 263,872 cells were obtained for subsequent…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —National Key Research and Development Program of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroptosis and cancer prognosis · Single-cell and spatial transcriptomics · Cancer Genomics and Diagnostics
1. Introduction
Colorectal cancer (CRC) accounts for 10% of global cancer cases and is the third most commonly diagnosed cancer and the second leading cause of cancer-related deaths worldwide [1,2]. The treatment of CRC includes surgery and chemotherapy [2]; however, most patients are already in the advanced stage at the time of diagnosis, and the prognosis is relatively poor. Therefore, early diagnosis and intervention are crucial measures to enhance the survival rates of patients with CRC [3]. Currently, most studies have focused on the analysis and treatment of advanced tumors, while largely ignoring precancerous lesions. There are few studies on the transition from a precancerous state to a cancerous state, including the changes in the cellular environment and molecular driving factors of this transition. The malignant transformation of colorectal epithelial cells is a key factor in the development of colorectal tumors [4]. The immune response, gene mutations, cytokines, gut microbiota, programmed cell death, and many other factors can lead to the deterioration of the phenotype of intestinal epithelial cells [5,6,7]; however, the specific pathways and exact mechanisms remain unclear.
The tumor microenvironment (TME) is an important site for the survival and development of tumor cells [8]. Immune cell infiltration into the TME is one of the key factors determining tumor occurrence, development, and the efficacy of anti-tumor treatments [9]. Inappropriate immunotherapy can lead to the development of drug resistance in tumors. Tumor heterogeneity can affect the growth rate, invasiveness, and drug sensitivity of tumors, thereby influencing the prognosis of patients. Therefore, the study of TME heterogeneity in CRC could provide new directions for targeted and immunotherapy.
Tumor heterogeneity encompasses not only the complexity of microenvironmental components but also, more fundamentally, the genetic and phenotypic diversity generated within epithelial cells during malignant progression. This profound intra-tumoral heterogeneity—manifested through the differential accumulation of copy number variations (CNVs) and the subpopulation-specific activation of functional pathways (such as Wnt–β-catenin or P53 signaling)—is a critical driver of the adenoma-to-carcinoma transition and treatment resistance. Despite its importance, existing studies often overlook the fine-grained evolutionary patterns within epithelial lineages and their specific prognostic implications during early malignancy. Consequently, leveraging single-cell RNA sequencing to decode these heterogeneous epithelial patterns, integrated with pseudotime trajectory inference and large-scale clinical validation, is essential for identifying key drivers of malignancy and establishing robust risk stratification systems.
Single-cell RNA sequencing (scRNA-seq) is an important tool for studying tumor mechanisms. It can analyze complex cell landscapes and identify cell types, cell subpopulations, and cell-specific genes associated with diseases [10]. Furthermore, scRNA-seq can provide detailed information on cell diversity and heterogeneity, and can be used to assess the complexity of the TME [11]. Currently, scRNA-seq has played a crucial role in studying various tumor-infiltrating myeloid cells in CRC [12], identifying genes associated with T cell subtypes [13], and investigating the subtypes and functions of tumor-associated fibroblasts [14]. However, most existing analyses have focused broadly on immune infiltration or major cell lineages, leaving the fine-grained evolution of epithelial subtypes and their specific prognostic implications less explored. Furthermore, while scRNA-seq identifies potential targets, a significant gap remains in translating these high-dimensional biological insights into robust, clinically applicable prognostic markers validated in large-scale patient cohorts.
In this study, we aimed to bridge this gap by performing a focused, high-resolution re-interrogation of scRNA-seq data covering the normal-adenoma-carcinoma transition. Distinct from previous studies that primarily described the general cellular composition, our analysis specifically contributes in three key aspects: (1) we identified distinct epithelial subpopulations characterized by unique gene signatures (e.g., KCNMA1 and MKI67) that mark the transition from benign to malignant states; (2) we mapped the dynamic intercellular communication networks, revealing how these specific epithelial subsets re-program fibroblasts and immune cells; and (3) crucially, we validated the clinical relevance of these single-cell derived markers using large-scale independent bulk sequencing data (TCGA, n = 966), establishing a robust link between cellular heterogeneity and patient survival. This study provides a refined understanding of epithelial lineage evolution and proposes novel prognostic biomarkers for CRC risk stratification.
2. Materials and Methods
2.1. Data Acquisition and Processing
The CRC scRNA-seq dataset used in this study was downloaded from the Gene Expression Omnibus (GEO) (GSE201348) [15,16]. The data were generated on the GPL24676 platform (Illumina NovaSeq 6000) and were last updated on 28 July 2022. After excluding samples with incomplete sequencing data, we included 42 samples for subsequent analyses, including eight normal colorectal, two colorectal adenoma, 27 high-grade colorectal intraepithelial neoplasia, and five malignant CRC samples.
Preliminary data cleaning was performed using the “Seurat” package in the R software (version 4.5.0): extreme cells with < 200 expressed genes and a proportion of mitochondrial RNA ≥ 20% were removed [17]. In each cell, gene expression was normalized and log-transformed using the “Normalize Data” function. Gene expression values were further z-score transformed to achieve normalization using the “Scale Data” function. The “Run Harmony” function was executed to remove batch effects between samples. The “Find Clusters” function identified similarities between cells and classified cells into distinct clusters. The dimension of the integrated samples was reduced, and visualization was achieved through the Uniform Manifold Approximation and Projection (UMAP) method [18]. Relevant parameters in the analysis process are included in the Supplementary Table S1.
2.2. Cell Annotation
Based on the well-established marker genes of cell types, the cells included in the analysis were classified as immune cells (PTPRC), fibroblasts (COL1A1, ACTA2), endothelial cells (PECAM1), B cells (CD19), and epithelial cells (EPCAM). Cell type identities were assigned and visualized using the “Feature Plot”, “Dot Plot”, and “Recode” function.
2.3. Identification of Characteristic Genes in Different Cell Types Within the TME
The “FindAllMarkers” function was used to identify the characteristic genes of different cell types, as determined by the Wilcoxon rank sum test [19]. The screening criteria were as follows: the expression in at least 10% of cells in any group must be met simultaneously, with log fold change (FC) > 0.6 and p-value < 0.05. The top five characteristic genes of each subtype were selected and displayed by the “Dot Plot” function.
2.4. Preference Analysis of Cell Subclusters for Tissue Samples
Single-cell tissue preference analysis is primarily used to assess the distribution tendencies of different cell types or subpopulations across various tissues/groups. This analysis helps to characterize the distribution and function of cell types in different tissues/groups, thereby revealing the relationship between cell types and tissues/groups. We used the “As.data.frame” function to retrieve the metadata from the “Seurat” data and the “Do.tissueDist” function to calculate the preference between cell clusters and tissues.
2.5. Copy Number Variation (CNV) Heterogeneity Analysis
The “infercnv” package in R was used to evaluate the CNVs in each region [20]. We then used the “Subset” function to obtain the specified cells, the “Data.frame(Idents)” function to obtain the annotation file, the “AnnoGene” function to obtain the gene annotation information, and the “CreateInfercnvObject” function to create the IFN format file. Finally, the “Infercnv” function was used to calculate the CNVscore value based on the expression level of each cell. The cells were divided into four grades (Normal, High, Adenoma, Tumor) according to the “Quantile” function of CNVscore probs = c (0, 0.25, 0.5, 0.75, 1). The “Pheatmap” function was used to display the differentially expressed genes (DEGs) of the annotated sample and cell subpopulation groups. The top 11 featured genes were displayed for each subgroup. The “Ggplot” function was used to draw Joy plots to compare the CNVscore of each sample and sample source group.
2.6. Survival Analysis
The “As.data.frame” function was used to obtain the expression matrix of the specific cell subset, and the “Fwrite” function was used to output the cibersortx file. We then logged into the Cibersortx website (https://cibersortx.stanford.edu/, access on 23 December 2025) and submitted the Cibersortx file and the gene expression matrix file containing 966 CRC samples from The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/, access on 23 December 2025), to obtain the abundance scores of specific cells. Based on the clinical data in the TCGA, the Cox proportional hazards regression model was constructed through the “Coxph” function. The results were visualized using the “Forestmodel” function. Subsequently, the “Ggsurvplot” function was used to display the survival curve relationship between cell subsets and TCGA clinical samples.
2.7. Pseudotime Analysis of the Cell Differentiation Trajectory
This study performed pseudotemporal analysis using the Monocle2 toolkit, which was used to characterize the changes in gene expression during epithelial deterioration from colorectal adenoma to cancer progression, explore the lineage differentiation of the cells, and identify precursor epithelial cells for deterioration by learning the sequence of gene expression changes that each cell must undergo as part of a dynamic biological process [21]. Trajectories were constructed that reflected the cells from the initial state to the end state. We created a CellDataSet object for a single cell of each cell type using default parameters. Two main steps were then performed for single-cell trajectory construction. The first step was to select DEGs in malignant cell subsets as ranking genes. The second step involved using ranked genes for dimensionality reduction and trajectory construction. The constructed single-cell differentiation trajectories could be used to speculate on different epithelial cell states during deterioration.
2.8. TME Cell Interaction Analysis
The “CellPhoneDB” package in R was used to explore the interactions between various cell types in the CRC TME [22]. Single-cell transcription data for each cell type in the tumor sample (10,000 cells for each epithelial cell subset and all other cell populations) were input into “CellPhoneDB” for cell–cell interaction analysis, and pairwise comparisons of the included cell types were performed. Cluster labels for all cells were first randomly permuted 1000 times to determine the mean value of receptor and ligand expression levels for clusters of interacting cells, which produced a null distribution for each receptor–ligand pair. The p-value for the likelihood of cell type specificity of the corresponding receptor–ligand complex was obtained by calculating the proportion of permutations in which the mean was higher than the actual mean. The cell interactions were subsequently visualized using the “Heatmap” function.
2.9. Gene Set Variation Analysis (GSVA)
GSVA is a non-parametric and unsupervised method that calculates the gene set enrichment score of a sample based on the genes inside and outside the gene set [23]. GSVA can estimate changes in gene set enrichment in a sample independently of any category label. We utilized the “GSVA” package in R to perform functional enrichment analysis of cell subsets at various stages of tumor progression, aiming to elucidate the alterations in oncogenic pathways during this process. GSVA enrichment scores were calculated using the “GSVA” function, and the results were visualized by the “Pheatmap” function.
2.10. Statistical Analysis
All analyses in this study were conducted using R software (v4.2.1). Counting data were expressed as the number of cases or percentages. Differential expression analysis of single-cell data was conducted using the Wilcoxon rank sum test. The p-value correction was carried out using the Bonferroni correction method. The adjusted p-value < 0.05 indicated a statistically significant difference.
3. Results
3.1. Single-Cell Clustering Landscape in CRC Tumor Tissue
This study included 42 samples from the GSE201348 dataset. A gene expression matrix of 26,335 genes across 263,872 cells was obtained for subsequent analyses. After multiple data quality control steps, the cell types were annotated into four clusters based on the single-cell transcriptome of all samples and the classic marker genes in each cell (Figure 1A). The classic marker gene used to distinguish immune cells was PTPRC; the classic marker genes used to distinguish fibroblasts were COL1A1, ACTA2, and PDGFRA; the classic marker gene used to distinguish endothelial cells was PECAM1; the classic marker gene used to distinguish B cells was CD19 (this cell population was not found in this study); and the classic marker gene used to distinguish epithelial cells was EPCAM (Figure 1B).
The cells were further divided into 13 subpopulations using the “Recode” function. Based on the “FindAllMarkers” function and a logFC threshold of 0.6, the characteristic genes of 13 groups of CRC subtypes in the TME were identified. Figure 1C shows the representative marker genes and their expression levels in different cell clusters. The preference of cells for different groups (normal–normal tissue, adenoma–adenoma tissue, high–high-grade intraepithelial neoplasia tissue, tumor–malignant tumor tissue) during the progression of CRC was calculated based on the odds ratios, and the composition and differences in the main cell types among different tissues were analyzed. The results indicated that epithelial cells preferred adenomas and high-grade intraepithelial neoplasia. Immune cells preferred malignant tumors. Endothelial cells and fibroblasts preferred both malignant tumors and normal tissues (Figure 1D). The “UMAP” function was used to visualize the distribution of major cell types and the number of DEGs (Figure 1E).
3.2. Epithelial Cells in CRC Tissues Exhibit Highly Heterogeneous CNV Patterns
Based on the “FindAllMarkers” function and a logFC threshold of 0.6, all epithelial cells were re-clustered and analyzed to identify epithelial cell subtypes in the CRC microenvironment. The cell subtypes were named “specific expressed gene^+^ cell type,” and the UMAP map was drawn (Figure 2A). Figure 2B shows the expression of the top five characteristic genes in each subgroup of the epithelial cell subtype. For instance, SLC27A6 was significantly elevated in the “SLC27A6^+^ Epithelial” subtype of the tumor group (Figure 2B). Following this method, “MKI67^+^ Epithelial, SLC27A6^+^ Epithelial, PLCE1^+^ Epithelial, NKD1^+^ Epithelial, KCNMA1^+^ Epithelial, GDA^+^ Epithelial, CLCA4^+^ Epithelial, BEST4^+^ Epithelial, LRMP^+^ Epithelial, ACTG2^+^ Epithelial, and ASPM^+^ Epithelial,” a total of 11 epithelial cell subsets were identified. To further define the malignant status of epithelial cell subsets, the inferCNV function was used to infer the CNV spectrum of epithelial cells, and different degrees of CNV accumulation were compared between different patients and different pathological types (normal group vs. adenoma group vs. high-grade intraepithelial neoplasia vs. malignant tumors), focusing on subcluster specificity and patient specificity. Epithelial cells were divided into four groups according to the CNV score: “normal, low, medium, and high.” Joyplots showed the distribution of CNV scores in different samples and different groups (Figure 2C,D). CNV trends showed high consistency across all patients, indicating that the identified epithelial subpopulations reflect universal biological characteristics of disease progression rather than individual variations. The correlation between the percentage of cells with a high CNV score and the percentage of malignant tumor cells in each epithelial cell subcluster was calculated, and the results showed a significant positive correlation between the tumor cells in the malignant group and the cells with a high CNV score (R = 0.38) (Figure 2E). The transcriptomes of epithelial cell subpopulations with different CNV levels were compared, and the heatmap showed the standardized expression levels of DEGs in cells with different CNV levels (Figure 2F). Functional pathways activated in normal, low, medium, and high cell groups with different CNV levels and different epithelial cell subsets were demonstrated by GSVA (Figure 2G,H). “MYC targets V2,” “Hypoxia,” “TGF-β signaling,” “Notch signaling,” and other signaling pathways were significantly enriched in epithelial cells with high CNV levels. The “P53 pathway,” “Wnt–β-catenin signaling,” “MYC targets V1,” and other signaling pathways were significantly enriched in CRC epithelial cells.
The association between the relative cell abundance of different epithelial cell sub populations in the TCGA cohort and patient survival was evaluated through the deconvolution analysis with CIBERSORTx. The results indicated that patients with MKI67^+^ Epithelial, KCNMA1^+^ Epithelial, and GDA^+^ Epithelial were related to CRC prognosis (Figure 2I). Survival analysis showed that the prognosis of patients in the low-expression group of KCNMA1 was significantly worse than that in the high-expression group of KCNMA1. In addition, the prognosis of patients in the high-expression MKI67 group was significantly worse than that in the low-expression MKI67 group (Figure 2J).
3.3. Pseudotime Analysis Revealed the Dynamic Changes in Epithelial Cell Status During the Malignant Progression of CRC
Violin plots were used to visualize the expression of representative genes for pro liferative markers (e.g., MKI67) and malignant markers (e.g., BRAF, MUC2, KRAS) in epithelial cell subclusters (Figure 3A). Based on different paths, the changes in cell states in the differentiation trajectory (from the initial state to the terminal state), DEGs between different states, and the top five highly expressed genes were analyzed. We assessed whether the differentiation paths and main branches revealed by the two analyses were similar to confirm the accuracy of the trajectory prediction (Figure 3B). The proportions of different cell states, sample sources, CNV levels, and cell subtypes in the pseudo-temporal differentiation trajectories were analyzed based on different paths. The heatmap displays the expression of 100 DEGs, including proliferative and malignant marker genes, along the false time trajectory (Figure 3C). The intersection of genes that are significantly highly expressed in epithelial cell states S5–S6 representing malignancy endpoints, genes that are overexpressed in epithelial cells from malignant tumor tissues, and marker genes that are highly expressed in CNV high–medium cells resulted in six genes: FP236383.1, FP671120.1, AC105402.3, HSP90AA1, LINC00486, and WASHC1 (Figure 3D). These six genes were defined as a potential set of molecular marker genes that initiate “malignant progression.”
3.4. Interactions Between Epithelial Cells and Other Components of the TME During the Malignant Progression of CRC
To analyze the interaction between epithelial cells and other components in the TME during the malignant progression of tumors, the publicly available ligand–receptor interaction database CellPhoneDB was used. The heatmap showed the patterns of intercellular interactions in different pathological subgroup samples (Normal, Adenoma, High, Tumor) (Figure 4A–D). The results indicated that MKI67^+^ Epithelial cells were significantly positively correlated with villus fibroblast cells, crypt fibroblasts_2 cells, and crypt fibroblasts_1 cells in the four groups. Figure 4E shows the interaction scores of 178 ligand–receptor pairs in the Normal, Adenoma, High, and Tumor groups, respectively. The group-specific ligand–receptor pairs in the four groups are presented in bar charts (Figure 4F). Figure 4G shows the enrichment of sample-specific ligand–receptor pairs in cell subtypes.
3.5. Transcriptional Characteristics of Fibroblasts in CRC Tissues
Based on the “FindAllMarkers” function and a logFC of 0.6 as the threshold, all fibroblast cells were re-clustered and analyzed to identify fibroblast cell subtypes in the CRC TME. According to the classification criteria in the previously published literature, 10 groups of fibroblasts were identified (crypt fibroblasts_1, crypt fibroblasts_2, crypt fibroblasts_3, crypt fibroblasts_4, villus fibroblast, mCAF_1, mCAF_2, mCAF_3, mCAF_4, CAF) [16]. The UMAP, colored by subclusters, shows the subtypes of cancer-associated fibroblasts (CAFs) (Figure 5A). Figure 5B shows the distribution of CAFs in different samples. According to 10 specific markers (P2RY14, ADAM28, GPX2, LINC00511, SOX6, DES, EPHA7, RYR2, DPP6, and LAMA2), fibroblasts were divided into different subsets (Figure 5C). Figure 5D shows the expression of the top five subtype-specific marker genes in each subtype. The CAF subcluster characteristic genes and sample DEGs were intersected to obtain 45 significantly different CAF subcluster characteristic genes. Dot plots were used to display the expression of these DEGs in all CAF cell subclusters and samples (Figure 5E).
Monocle2 semi-supervised quasi-temporal trajectory analysis was used to present the trajectories of the CAF subtypes, which were sorted by quasi-temporal (upper left), cell status (lower left), cell clusters (upper middle), samples (lower middle), and the expression dynamics of CAF marker genes (Figure 5F). Figure 5G shows the expression of DEGs at different pseudotime points. GSVA enrichment analysis indicated that the main activation pathways of S1 phase fibroblasts were “positive regulation of metalloendopeptidase activity” and “negative regulation of granulocyte chemotaxis.” The main pathways of S2 phase cells were “xenobiotic glucuronidation,” “flavonoid glucuronidation,” “flavone metabolic process,” “negative regulation of histone H3/H4 methylation,” “interleukin 36 pathway,” and “mitochondrial ribosome binding.” The main pathways of S3 phase cells were “NGF independent TRKA activation” and “activation of TRKA receptor” (Figure 5H).
4. Discussion
CRC, as the second leading cause of cancer-related death, has shown an annual increase in incidence and mortality [24]. Therefore, studying the molecular mechanism of CRC occurrence and development is of great significance for the diagnosis and treatment of this disease [25]. Previous single-cell studies have often focused on the changes in the cellular composition of CRC tumor tissues and normal tissues (often focusing on immune cells), while neglecting the dynamic process of tumor progression [26,27]. In this study, we explored the lineage evolution during the malignant progression of CRC by mining scRNA-seq data, which gradually evolved from normal intestinal epithelial cells and a normal immune environment to abnormal intestinal epithelial cells, an abnormal immune microenvironment, and CAFs. This study aimed to explore the complexity and gradual nature of CRC evolution, providing a comprehensive analysis at the single-cell level for clinical diagnosis and treatment, and offering a basis for targeting specific cells and their corresponding targets.
We identified four cell clusters based on classic marker genes in each cell type: immune cells (PTPRC), fibroblasts (COL1A1, ACTA2, and PDGFRA), epithelial cells (EPCAM), and endothelial cells (PECAM1). Tissue preference analysis indicated that epithelial cells preferred adenomas and high-grade intraepithelial neoplasia tissues. This result suggests that changes in the composition and proportion of epithelial cells may be a key factor in the malignant progression of CRC. We used the “FindAllMarkers” function to re-cluster epithelial cells, dividing them into 11 cell subpopulations: “MKI67^+^ Epithelial, SLC27A6^+^ Epithelial, PLCE1^+^ Epithelial, NKD1^+^ Epithelial, KCNMA1^+^ Epithelial, GDA^+^ Epithelial, CLCA4^+^ Epithelial, BEST4^+^ Epithelial, LRMP^+^ Epithelial, ACTG2^+^ Epithelial, and ASPM^+^ Epithelial.” The results of the CNV analysis indicated that normal tissue-derived epithelial cells contained fewer deletions or amplifications compared with adenoma and carcinomatous epithelial cells. Furthermore, GSVA found that “P53 pathway,” “Wnt–β-catenin signaling,” “MYC targets V1,” and other signaling pathways were significantly enriched in malignant epithelial cells. The relationship between these signaling pathways and CRC has been partially investigated. Overexpression of circMYH9 promoted CRC proliferation by regulating serine/glycine metabolism and reduction–oxidation homeostasis in a p53-dependent manner [28]. The lncRNA NEAT1 interacts with DDX5 to activate Wnt–β-catenin signaling and promote the progression of CRC [29]. Inhibition of the Wnt–β-catenin signaling pathway can significantly inhibit the proliferation of CRC [30]. MYC is a well-known inducer of apoptosis. MYC inhibits the expression of major histocompatibility complex molecules and cytokine signaling in CRC cells in response to acute metabolic stress [31]. These results revealed the promoting or inhibitory effects of the discovered pathways in CRC; however, these studies mainly focused on the direct impact of signaling pathways on the overall tumor tissue. Whether targeted intervention can be carried out on epithelial cells or specific subpopulations of epithelial cells to achieve improved tumor treatment effects requires further research.
Based on the gene expression data of the CRC cohort in the TCGA database, the association between the infiltration score of epithelial cell subsets and prognosis was evaluated using CIBERSORTx. The results showed that MKI67^+^ Epithelial, KCNMA1^+^ Epithelial, and GDA^+^ Epithelial were significantly associated with the prognosis of patients. Survival analysis revealed that the prognosis of patients in the low-expression group of KCNMA1 was significantly worse than that of patients in the high-expression group of KCNMA1. The prognosis of patients in the high-expression MKI67 group was significantly worse than that of patients in the low-expression MKI67 group. Therefore, the reduction in KCNMA1^+^ Epithelial cells and the increase in MKI67^+^ Epithelial cells may be a key factor in the poor prognosis of CRC. The KCNMA1 gene encodes the α subunit of the calcium-activated high-conductivity potassium channel and is involved in various physiological processes, such as smooth muscle contraction, neurotransmitter release, and neuronal excitability [32]. It has been reported that KCNMA1 is a key tumor suppressor, which can significantly inhibit the migration and invasion of gastric cancer cells in synergy with PTK2 [33]. Studies confirm that highly methylated KCNMA1 promoter regions (e.g., sites cg24113782 and cg25655799) in CRC tissues represent a core epigenetic mechanism causing its early-stage expression loss during tumorigenesis. Furthermore, KCNMA1 is significantly downregulated in CRC tumor tissues and exhibits a statistically significant negative correlation with miR-17-5p (a classic oncogenic miRNA). This “high miR-17-5p/low KCNMA1” axial imbalance may drive malignant transformation of colorectal epithelial cells from adenoma to carcinoma by activating key oncogenic pathways such as Wnt/β-catenin [34,35].
The MKI67 gene, also known as Ki-67, is closely related to cancer cell proliferation. MKI67 has been associated with the malignant progression of various tumors, including lung adenocarcinoma, lung squamous cell carcinoma, liver hepatocellular carcinoma, and breast invasive carcinoma [36,37]. The expression and localization of Ki-67 in T cells after neoadjuvant therapy serve as reliable predictive markers for CRC [38]. Thus, KCNMA1 and MKI67 are potential therapeutic targets for CRC. Whether KCNMA1 and MKI67 affect CRC through a similar mechanism and whether targeting KCNMA1 and MKI67 can treat CRC remains to be further explored.
Based on different pathways, the proportions of different cell states, sample sources, CNV levels, and cell subtypes in the pseudotime differentiation trajectory were analyzed, and the genes significantly highly expressed in the epithelial cell state S5–S6, representing the malignant endpoint, were obtained. Six genes (FP236383.1, FP671120.1, AC105402.3, HSP90AA1, LINC00486, WASHC1) were identified in the intersection of these genes with the DEGs in the epithelial cells of malignant tumor tissues and the high cell group of CNV. These six genes are considered key genes that initiate the malignant progression of CRC. The role of these identified genes in tumors has been partially studied. HSP90AA1 encodes an inducible molecular chaperone that, together with its co-partner, regulates ATPase activity to maintain the correct folding of specific target proteins. HSP90AA1 increases the drug resistance of osteosarcoma by inducing autophagy and inhibiting apoptosis [39]. LINC00486 interacts specifically with nuclear factor kappa b repressing factor, activating NF-κB/TNF-α signaling and thereby inhibiting natural killer T-cell lymphoma [40]. The roles of FP236383.1, FP671120.1, AC105402, and WASHC1 in the field of oncology have not yet been studied. The mechanisms of the six genes in the direction of CRC are also far from being clarified, representing a new research direction.
The cell–cell communication analysis revealed the interaction between epithelial cells and fibroblasts. For example, MKI67 + Epithelial cells were significantly positively correlated with villus fibroblasts, crypt fibroblasts_2 cells, and crypt fibroblasts_1 cells in the four groups of normal, adenoma, high, and tumor. To further explore the subpopulation classification and functions of fibroblasts, we adopted a strategy similar to that used for epithelial cells, dividing fibroblasts into “P2RY14^+^ Fibroblast, ADAM28^+^ Fibroblast, GPX2^+^ Fibroblast, LINC00511^+^ Fibroblast, SOX6^+^ Fibroblast, DES^+^ Fibroblast, EPHA7^+^ Fibroblast, RYR2^+^ Fibroblast, DPP6^+^ Fibroblast, and LAMA2^+^ Fibroblast,” totaling 10 cell subpopulations. GSVA indicated that fibroblasts may exert their effects through signaling pathways such as the “negative regulation of granulocyte chemotaxis,” “negative regulation of histone H3/H4 methylation,” “interleukin 36 pathway,” and “NGF independent TRKA activation.” The interaction and mechanism between specific subtypes of fibroblasts and epithelial cells require further in-depth research.
This study has some limitations. The results are primarily based on bioinformatics analysis of public databases and lack in-depth experimental and clinical validation. In the future, we will further explore and validate the characteristics of high-grade and low-grade epithelial cell subtypes in CRC, as well as the specific mechanisms of action of their associated genes. This will provide a basis for immune-targeted therapies and informed clinical decision-making for patients with CRC.
5. Conclusions
This study analyzed scRNA-seq data to reveal the composition and function of cells in patients with CRC. It identified 11 epithelial cell subpopulations, among which MKI67^+^ Epithelial and KCNMA1^+^ Epithelial cells were significantly associated with malignant progression. The research confirmed the crucial role of epithelial cells and fibroblast subsets in tumor malignancy and identified multiple genes related to malignant progression, including MKI67, KCNMA1, FP236383.1, FP671120.1, AC105402.3, HSP90AA1, LINC00486, and WASHC1. These findings lay the foundation for clarifying the molecular mechanisms of CRC and provide new insights for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Morgan E. Arnold M. Gini A. Lorenzoni V. Cabasag C.J. Laversanne M. Vignat J. Ferlay J. Murphy N. Bray F. Global burden of colorectal cancer in 2020 and 2040: Incidence and mortality estimates from GLOBOCAN Gut 20237233834410.1136/gutjnl-2022-32773636604116 · doi ↗ · pubmed ↗
- 2Sung H. Siegel R.L. Laversanne M. Jiang C. Morgan E. Zahwe M. Cao Y. Bray F. Jemal A. Colorectal cancer incidence trends in younger versus older adults: An analysis of population-based cancer registry data Lancet Oncol.202526516310.1016/S 1470-2045(24)00600-439674189 PMC 11695264 · doi ↗ · pubmed ↗
- 3Zhou J. Yang Q. Zhao S. Sun L. Li R. Wang J. Wang L. Wang D. Evolving landscape of colorectal cancer: Global and regional burden, risk factor dynamics, and future scenarios (the Global Burden of Disease 1990–2050)Ageing Res. Rev.202510410266610.1016/j.arr.2025.10266639828028 · doi ↗ · pubmed ↗
- 4Liang J. Wang N. Yao Y. Wang Y. An X. Wang H. Liu H. Jiang Y. Li H. Cheng X. NEDD 4L mediates intestinal epithelial cell ferroptosis to restrict inflammatory bowel diseases and colorectal tumorigenesis J. Clin. Investig.2024135 e 17399410.1172/JCI 17399439688910 PMC 11785928 · doi ↗ · pubmed ↗
- 5Bhat A.A. Nisar S. Singh M. Ashraf B. Masoodi T. Prasad C.P. Sharma A. Maacha S. Karedath T. Hashem S. Cytokine- and chemokine-induced inflammatory colorectal tumor microenvironment: Emerging avenue for targeted therapy Cancer Commun.20224268971510.1002/cac 2.12295 PMC 939531735791509 · doi ↗ · pubmed ↗
- 6Liao K. Wen J. Liu Z. Zhang B. Zhang X. Fu Y. Zhang W. Hu H. Ai K. Zhu W. The role of intratumoral microbiome in the occurrence, proliferation, metastasis of colorectal cancer and its underlying therapeutic strategies Ageing Res. Rev.202511110282010.1016/j.arr.2025.10282040639623 · doi ↗ · pubmed ↗
- 7Bao Z. Zeng W. Zhang D. Wang L. Deng X. Lai J. Li J. Gong J. Xiang G. SNAIL induces EMT and lung metastasis of tumours secreting CXCL 2 to promote the invasion of M 2-type immunosuppressed macrophages in colorectal cancer Int. J. Biol. Sci.2022182867288110.7150/ijbs.6685435541899 PMC 9066124 · doi ↗ · pubmed ↗
- 8Kounatidis D. Vallianou N.G. Karampela I. Grivakou E. Dalamaga M. The intricate role of adipokines in cancer-related signaling and the tumor microenvironment: In-sights for future research Semin. Cancer Biol.202511313015010.1016/j.semcancer.2025.05.01340412490 · doi ↗ · pubmed ↗
