# Screening and identification of novel protein markers of early-stage lung cancer and construction and application of screening models

**Authors:** Huijie Yuan, Shuyin Duan, Clement Yaw Effah, Sitian He, Yaru Chai, Xia Liu, Lihua Ding, Yongjun Wu

PMC · DOI: 10.3389/fonc.2025.1567673 · 2025-05-27

## TL;DR

This study identifies new protein markers for early-stage lung cancer and builds machine learning models to screen high-risk individuals.

## Contribution

Novel protein markers and machine learning models for early lung cancer screening are developed and validated.

## Key findings

- CLEC3B, AOC3, CAT, SEPP1, and HBB are early molecular markers in lung tumorigenesis.
- Machine learning models using these markers achieved AUCs of 0.868 and 0.844.
- Protein expression changes were validated in cell cultures and mouse models.

## Abstract

Molecular biomarkers have the potential to improve the current state of early screening of lung cancer. This investigation aimed to identify novel protein markers for early-stage lung cancer and combine them with traditional tumor markers to develop machine learning models for lung cancer screening.

The protein alters of peripheral blood (5 patients with early-stage lung adenocarcinoma, 5 patients with early-stage lung squamous cell carcinoma, and 8 healthy controls) were detected by label-free quantitative proteomics. The novel candidate protein markers were preferentially selected by multi-omics technology. Then, the malignant transformation of BEAS-2B cells and lung carcinogenesis in C57BL/6 mice were induced by coal tar pitch extracts (CTPE) so that the expressions of these markers at different stages of lung carcinogenesis could be dynamically tracked and validated. These markers in human plasma were detected and further confirmed by ELISA. Machine learning models were established to screen high-risk individuals of lung cancer.

The C-type lectin domain family 3 member B (CLEC3B), membrane primary amine oxidase (AOC3), hemoglobin subunit beta (HBB), catalase (CAT), and selenoprotein P (SEPP1) were screened as candidate protein markers for early-stage lung cancer. The expressions of CLEC3B, AOC3, CAT, and SEPP1 were statistically significant in various passages of cells cultured with exposure to CTPE compared to the saline group (P<0.05). In addition, the expressions of these 5 proteins were statistically significant in lung tissues, plasma, and alveolar lavage fluid of mice exposed to CTPE for 3, 6, 9 and 12 months compared to normal controls (P<0.05). There were notable variations in AOC3, CAT, CLEC3B, SEPP1, HBB, CEA, CYFRA21-1, and NSE among the healthy control group, lung cancer group and coke oven workers (P<0.05). The decision tree C5.0 (AUC=0.868) and artificial neural network (AUC=0.844) which combined these 8 markers showed better performance.

The differential changes of AOC3, CAT, CLEC3B, SEPP1, and HBB protein were proven as early molecular events in lung tumorigenesis. The screening models of lung cancer based on the novel protein markers and traditional tumor markers might be applied for the screening of high-risk individuals.

The flowchart of the study.

## Linked entities

- **Genes:** CLEC3B (C-type lectin domain family 3 member B) [NCBI Gene 7123], AOC3 (amine oxidase copper containing 3) [NCBI Gene 8639], HBB (hemoglobin subunit beta) [NCBI Gene 3043], CAT (catalase) [NCBI Gene 847], SELENOP (selenoprotein P) [NCBI Gene 6414], CEACAM5 (CEA cell adhesion molecule 5) [NCBI Gene 1048], ENO2 (enolase 2) [NCBI Gene 2026]
- **Proteins:** CLEC3B (C-type lectin domain family 3 member B), AOC3 (amine oxidase copper containing 3), HBB (hemoglobin subunit beta), CAT (catalase), SELENOP (selenoprotein P), CEACAM5 (CEA cell adhesion molecule 5), ENO2 (enolase 2)
- **Chemicals:** saline (PubChem CID 5234)
- **Diseases:** lung cancer (MONDO:0005138), lung adenocarcinoma (MONDO:0005061), lung squamous cell carcinoma (MONDO:0005097)
- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Genes:** SELENOP (selenoprotein P) [NCBI Gene 6414] {aka SELP, SEPP, SEPP1, SeP}, ENO2 (enolase 2) [NCBI Gene 2026] {aka HEL-S-279, NSE}, CAT (catalase) [NCBI Gene 847], HBB (hemoglobin subunit beta) [NCBI Gene 3043] {aka CD113t-C, ECYT6, beta-globin}, AOC3 (amine oxidase copper containing 3) [NCBI Gene 8639] {aka HPAO, SSAO, VAP-1, VAP1}, CLEC3B (C-type lectin domain family 3 member B) [NCBI Gene 7123] {aka MCDR4, TN, TNA}, CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}
- **Diseases:** lung carcinogenesis (MESH:D063646), lung squamous cell carcinoma (MESH:D002294), lung cancer (MESH:D008175), tumor (MESH:D009369), lung adenocarcinoma (MESH:D000077192)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]
- **Cell lines:** BEAS-2B — Homo sapiens (Human), Transformed cell line (CVCL_0168), C57BL/6 — Mus musculus (Mouse), Transformed cell line (CVCL_C0MU)

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12149181/full.md

---
Source: https://tomesphere.com/paper/PMC12149181