# Construction and validation of a novel diagnostic model for esophageal squamous cell carcinoma: an integrated analysis of multi-omics data

**Authors:** Yiyuan Cui, Sicong Li, Zhibin Wu, Yue Jin, Jingjie Yu, Yufan Chen, Jiayang Chen, Jinyuan Chang, Yijing Yan, Xinyu Li, Nuo Li, Shengjuan Hu, Chenxin Zhu, Li Feng

PMC · DOI: 10.3389/fimmu.2026.1685902 · Frontiers in Immunology · 2026-02-13

## TL;DR

Researchers developed a new diagnostic model for esophageal squamous cell carcinoma using multi-omics data and identified five key genes for early detection.

## Contribution

A novel diagnostic model for ESCC using integrated multi-omics data and identification of five robust biomarkers.

## Key findings

- A diagnostic model with five genes achieved high accuracy (AUCs of 0.99 and 0.97) in training and validation sets.
- SORBS2 was identified as a tumor-suppressor gene predominantly expressed in myofibroblasts in ESCC tissue.
- Single-cell RNA analysis revealed myofibroblasts as the main source of SORBS2 expression in tumor tissue.

## Abstract

Esophageal squamous cell carcinoma (ESCC), highly prevalent in China, has a limited number of ideal genes for early diagnosis, highlighting the need for the development of novel biomarkers to improve detection capabilities. The purpose of this study is to develop and validate a new genetic diagnostic model for ESCC.

Publicly available bulk RNA-seq datasets (GSE23400, GSE17351, GSE20347) were merged to identify differentially expressed genes (DEGs) between ESCC and adjacent normal tissues. Weighted gene co-expression network analysis (WGCNA) and protein-protein interaction (PPI) were performed to identify hub genes associated with ESCC. We identified the intersecting genes between the DEGs and those within the ESCC-related module identified by WGCNA. We subsequently refined these intersecting genes via LASSO regression and then constructed a diagnostic model for ESCC using multivariate logistic regression. ESCC samples from the TCGA database were used as the external validation set. Validation of the identified protective factor was conducted through Western blotting (WB) in mouse ESCC models and immunofluorescence (IF) in human tissues. Additionally, single-cell RNA analysis was conducted to explore the cell types expressing the marker genes.

113 upregulated and 173 downregulated genes were found in the ESCC groups. WGCNA identified the blue module (13 genes) as most correlated with ESCC. We obtained a total of 13 intersecting genes. Among them, five genes formed the diagnostic model: Logit(P) = −24.4547 + 2.0567×BID + 0.7396×CBX3 + 2.3757×ECT2 + 0.5667×KIF14 − 2.1019×SORBS2. The model achieved AUCs of 0.99 (training set) and 0.97 (external validation set). SORBS2 was the only potential protective factor in the model. WB indicated higher expression levels of SORBS2 in the adjacent normal esophageal tissue compared to those in the ESCC tissue. single-cell RNA analysis revealed that myofibroblasts are the predominant cellular source of SORBS2 expression within ESCC tumor tissue. IF confirmed lower level of SORBS2 expression in myofibroblast in the ESCC than those in the adjacent normal esophageal tissue.

We developed an ESCC diagnostic model and identified BID, CBX3, ECT2, KIF14, and SORBS2 as robust ESCC biomarkers. SORBS2 is a tumor-suppressor gene predominantly expressed in myofibroblasts.

## Linked entities

- **Genes:** BID (BH3 interacting domain death agonist) [NCBI Gene 637], CBX3 (chromobox 3) [NCBI Gene 11335], ECT2 (epithelial cell transforming 2) [NCBI Gene 1894], KIF14 (kinesin family member 14) [NCBI Gene 9928], SORBS2 (sorbin and SH3 domain containing 2) [NCBI Gene 8470]
- **Diseases:** esophageal squamous cell carcinoma (MONDO:0005580), ESCC (MONDO:0005580)

## Full-text entities

- **Genes:** CCNB1 (cyclin B1) [NCBI Gene 891] {aka CCNB}, MCM10 (minichromosome maintenance 10 replication initiation factor) [NCBI Gene 55388] {aka CNA43, DNA43, IMD80, PRO2249}, Amn (amnionless) [NCBI Gene 93835], Cdh5 (cadherin 5) [NCBI Gene 12562] {aka 7B4, Cd144, VE-Cad, VECD, VEcad, Vec}, Gdf5 (growth differentiation factor 5) [NCBI Gene 14563] {aka BMP-14, Cdmp-1, bp, brp}, KIF14 (kinesin family member 14) [NCBI Gene 9928] {aka MCPH20, MKS12}, Krt19 (keratin 19) [NCBI Gene 16669] {aka CK-19, EndoC, K19, Krt-1.19, Krt1-19}, STAT3 (signal transducer and activator of transcription 3) [NCBI Gene 6774] {aka ADMIO, ADMIO1, APRF, HIES}, Cbx3 (chromobox 3) [NCBI Gene 12417] {aka HP1g, M32}, SORBS2 (sorbin and SH3 domain containing 2) [NCBI Gene 8470] {aka ARGBP2, PRO0618}, Trp53-ps (transformation related protein 53, pseudogene) [NCBI Gene 22060], TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, Mmp9 (matrix metallopeptidase 9) [NCBI Gene 17395] {aka B/MMP9, Clg4b, Gel B, MMP-9, pro-MMP-9}, Mir146 (microRNA 146) [NCBI Gene 387164] {aka Mirn146, miR-146a, mmu-mir-146}, CBX5 (chromobox 5) [NCBI Gene 23468] {aka HEL25, HP1, HP1A, HP1alpha}, Ect2 (epithelial cell transforming 2) [NCBI Gene 13605], Actb (actin, beta) [NCBI Gene 11461] {aka Actx, E430023M04Rik, beta-actin}, DTL (denticleless E3 ubiquitin protein ligase adapter) [NCBI Gene 51514] {aka CDT2, DCAF2, L2DTL, RAMP}, Timp1 (tissue inhibitor of metalloproteinase 1) [NCBI Gene 21857] {aka Clgi, EPA, TIMP-1, TPA-S1, Timp}, ANP32E (acidic nuclear phosphoprotein 32 family member E) [NCBI Gene 81611] {aka LANP-L, LANPL}, TAGLN (transgelin) [NCBI Gene 6876] {aka SM22, SM22-alpha, SMCC, TAGLN1, TGLN, WS3-10}, Bid (BH3 interacting domain death agonist) [NCBI Gene 12122] {aka 2700049M22Rik}, BID (BH3 interacting domain death agonist) [NCBI Gene 637] {aka FP497}, KIF4A (kinesin family member 4A) [NCBI Gene 24137] {aka KIF4, KIF4G1, MRX100, TMDI, XLID100}, BCL2 (BCL2 apoptosis regulator) [NCBI Gene 596] {aka Bcl-2, PPP1R50}, Tgfb1 (transforming growth factor, beta 1) [NCBI Gene 21803] {aka TGF-beta1, TGFbeta1, Tgfb, Tgfb-1}, GMPS (guanosine monophosphate synthase) [NCBI Gene 8833] {aka GATD7}, Cd274 (CD274 antigen) [NCBI Gene 60533] {aka A530045L16Rik, B7h1, Pdcd1l1, Pdcd1lg1, Pdl1}, Kif14 (kinesin family member 14) [NCBI Gene 381293] {aka D1Ertd367e, E130203M01}, ATAD2 (ATPase family AAA domain containing 2) [NCBI Gene 29028] {aka ANCCA, CT137, PRO2000}, H3P16 (H3 histone pseudogene 16) [NCBI Gene 644914] {aka H3.6, H3F3AP6, p21}, Sorbs2 (sorbin and SH3 domain containing 2) [NCBI Gene 234214] {aka 2010203O03Rik, 9430041O17Rik, A530071H08, Argbp2, mKIAA0777, nArgBP2}, Pecam1 (platelet/endothelial cell adhesion molecule 1) [NCBI Gene 18613] {aka Cd31, PECAM-1, Pecam}, CBX3 (chromobox 3) [NCBI Gene 11335] {aka HECH, HP1-GAMMA, HP1Hs-gamma, HP1gamma}, JAK2 (Janus kinase 2) [NCBI Gene 3717] {aka JTK10}, ECT2 (epithelial cell transforming 2) [NCBI Gene 1894] {aka ARHGEF31}, NDC1 (NDC1 transmembrane nucleoporin) [NCBI Gene 55706] {aka NEDAPA, NET3, TMEM48}, Vsir (V-set immunoregulatory receptor) [NCBI Gene 74048] {aka 4632428N05Rik, Dies1, PD-1H, VISTA}, NETO2 (neuropilin and tolloid like 2) [NCBI Gene 81831] {aka BTCL2, NEOT2}
- **Diseases:** ESCC (MESH:D000077277), Cancer (MESH:D009369), EAC (MESH:D000230), vascular abnormalities (MESH:D014652), squamous carcinoma (MESH:D002294), lung adenocarcinoma (MESH:D000077192), gastric, liver, and esophageal cancer (MESH:D013274), EC (MESH:D005955), colorectal cancer (MESH:D015179), deaths (MESH:D003643), metastasis (MESH:D009362), dysphagia (MESH:D003680), BLUE (OMIM:190900), Esophageal cancer (MESH:D004938), hepatocellular carcinoma (MESH:D006528)
- **Chemicals:** streptomycin (MESH:D013307), bromophenol blue (MESH:D001978), xylene (MESH:D014992), paraffin (MESH:D010232), methionine (MESH:D008715), ethanol (MESH:D000431), SDS (MESH:D012967), Laemmli buffer (MESH:C088816), Alexa Fluor 488 (MESH:C000711379), penicillin (MESH:D010406), Alexa Fluor 594 (-), DAPI (MESH:C007293), Formalin (MESH:D005557), PVDF (MESH:C024865), sphingolipid (MESH:D013107), PBS (MESH:D007854), Cysteine (MESH:D003545), CO2 (MESH:D002245), citrate (MESH:D019343)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606], Labyrinthula sp. f (species) [taxon 160257]
- **Cell lines:** AKR — Mus musculus (Mouse), Mouse thymic lymphoma, Cancer cell line (CVCL_6565)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12946021/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12946021/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12946021/full.md

---
Source: https://tomesphere.com/paper/PMC12946021