# A screening strategy based on machine learning for diagnostic biomarkers in small cell lung cancer

**Authors:** Yifeng Pan, Xuansheng Ding, Wenyun Duan, Liangbiao Wang, Yong Dai, Rongrong Han, Shubei Wang, Mingquan Guo

PMC · DOI: 10.1371/journal.pone.0339195 · PLOS One · 2026-01-22

## TL;DR

This paper introduces a machine learning method to identify exosome RNA biomarkers for early diagnosis of small cell lung cancer.

## Contribution

A novel machine learning strategy for screening exosome RNA biomarkers with high diagnostic accuracy for SCLC.

## Key findings

- An optimal combination of three exosome RNAs (LINC00989, CXCL5, and MAP3K7CL) achieved an AUC of 0.950 for SCLC diagnosis.
- The method showed significant specificity for SCLC compared to other cancers like gastric and breast cancer.
- Tissue-based validation of two biomarkers (CXCL5 and MAP3K7CL) showed moderate performance with an AUC of 0.718.

## Abstract

Small cell lung cancer (SCLC) is the most aggressive subtype with high mortality rates due to the lack of specific diagnostic biomarkers to delay the optimal opportunity for treatment. Traditional biomarkers, such as neuron-specific enolase (NSE) or pro-gastrin-releasing peptide (ProGRP), have insufficient specificity and sensitivity to meet the demands of clinical diagnosis. Exosome and its contents have become burgeoning cancer biomarkers due to their diverse molecular cargo to achieve intercellular communication. Herein, a novel machine learning strategy was reported for rapid, efficient screening of biomarkers and identified an optimal exosome RNA combination as diagnostic biomarker of SCLC. Firstly, RNA sequencing data from 111 SCLC patients and 362 healthy controls were obtained from the exoRBase 2.0 and 3.0 databases. The machine learning methods were employed to select specific RNA by using 20 iterations with 10-fold nested cross-validation for SCLC diagnosis. Then, an optimal combination of three exosome RNAs (LINC00989, CXCL5, and MAP3K7CL) was confirmed and achieved excellent diagnostic performance (area under the curve (AUC) of 0.950, sensitivity of 0.936, and specificity of 0.892). Finally, an independent validation cohort containing tissue-based RNA expression data for two biomarkers (CXCL5 and MAP3K7CL) from 79 SCLC patients and 7 standard controls was used to evaluate the diagnostic performance of the selected RNAs. The results demonstrated modest diagnostic performance in tissue samples (AUC = 0.718) with two biomarkers, indicating potential cross-tissue applicability despite the limitations of incomplete biomarker coverage. In addition, a specificity analysis of exosome RNA data, including gastric cancer, hepatocellular carcinoma, and breast cancer, demonstrated significant specificity for SCLC. Therefore, the novel biomarker screening strategy integrating nested cross-validation with multiple machine learning algorithms successfully established to offer a potentially valuable protocol for early SCLC diagnosis and other cancers.

## Linked entities

- **Genes:** LINC00989 (long intergenic non-protein coding RNA 989) [NCBI Gene 100506035], CXCL5 (C-X-C motif chemokine ligand 5) [NCBI Gene 6374], MAP3K7CL (MAP3K7 C-terminal like) [NCBI Gene 56911]
- **Diseases:** small cell lung cancer (MONDO:0008433), gastric cancer (MONDO:0001056), hepatocellular carcinoma (MONDO:0007256), breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** MAP3K7CL (MAP3K7 C-terminal like) [NCBI Gene 56911] {aka C21orf7, HC21ORF7, TAK1L, TAKL, TAKL-1, TAKL-2}, CXCR2 (C-X-C motif chemokine receptor 2) [NCBI Gene 3579] {aka CD182, CDw128b, CMKAR2, IL8R2, IL8RA, IL8RB}, CXCL5 (C-X-C motif chemokine ligand 5) [NCBI Gene 6374] {aka ENA-78, SCYB5}, LINC00989 (long intergenic non-protein coding RNA 989) [NCBI Gene 100506035], ENO2 (enolase 2) [NCBI Gene 2026] {aka HEL-S-279, NSE}, GRP (gastrin releasing peptide) [NCBI Gene 2922] {aka BN, GRP-10, preproGRP, proGRP}
- **Diseases:** SCLC (MESH:D055752), lung cancer (MESH:D008175), metastasis (MESH:D009362), blood coagulation (MESH:D001778), hepatocellular carcinoma (MESH:D006528), cancer (MESH:D009369), gastric cancer (MESH:D013274), breast cancer (MESH:D001943), inflammatory (MESH:D007249)
- **Chemicals:** lipids (MESH:D008055)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12826499/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12826499/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12826499/full.md

---
Source: https://tomesphere.com/paper/PMC12826499