# Single‐Cell Sequencing‐Guided Annotation of Rare Tumor Cells for Deep Learning‐Based Cytopathologic Diagnosis of Early Lung Cancer

**Authors:** Yichun Zhao, Ruoran Qiu, Zhuo Wang, Yunyun Li, Xu Yang, Yanlin Li, Xiaohan Shen, Yun Liu, Ziqiang Chen, Qihan You, Qihui Shi

PMC · DOI: 10.1002/advs.202416921 · Advanced Science · 2025-04-15

## TL;DR

This study uses single-cell DNA sequencing to create accurate datasets for training deep learning models to detect early lung cancer cells in bronchoalveolar lavage samples.

## Contribution

The study introduces an objective, expert-free method for annotating rare tumor cells using single-cell DNA sequencing.

## Key findings

- A deep learning model achieved high accuracy (AUC 0.997 and 0.956) in distinguishing tumor cells from benign cells.
- The model improved lung cancer diagnosis sensitivity from 19.0% to 47.6% in validation cohorts.
- In external validation, the model achieved 60.0% sensitivity and 92.5% specificity for lung cancer diagnosis.

## Abstract

Deep learning (DL) models for medical image analysis are majorly bottlenecked by the lack of well‐annotated datasets. Bronchoalveolar lavage (BAL) is a minimally invasive procedure to diagnose lung cancer, but BAL cytology suffers from low sensitivity. The success of DL in BAL cytology is rare due to the rarity of exfoliated tumor cells (ETCs) and their subtle morphological differences from normal cells. Single‐cell DNA sequencing (scDNA‐Seq) is utilized as an objective ground truth of ETC annotation for generating an unbiased, accurately annotated dataset comprising 580 ETCs and 1106 benign cells from BAL cytology slides. A DL model is developed, to distinguish ETC from benign cells in BAL fluid, achieving an Area Under the Curve of 0.997 and 0.956 for detecting large‐ and small‐sized ETCs, respectively. The model is applied in a discovery cohort (n = 156) to establish BAL‐based cytopathologic diagnostic model for lung cancer. The model is evaluated in a validation cohort (n = 158), and yielded 47.6% sensitivity and 97.7% specificity in lung cancer diagnosis, outperforming cytology with improved sensitivity (47.6% vs 19.0%). In an external validation cohort (n = 141), the model achieved 60.0% sensitivity and 92.5% specificity in lung cancer diagnosis.

This study presents an innovative method to generate high‐quality, objectively annotated cytology datasets without relying on expert involvement. Large‐scale single‐cell DNA sequencing is used as an objective ground truth of cell annotation for generating an accurate, unbiased dataset of exfoliated tumor cells from bronchoalveolar lavage cytology slides. A deep learning model is developed for cytopathologic diagnosis of early lung cancer.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Diseases:** Lung Cancer (MESH:D008175), Tumor (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12165082/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12165082/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12165082/full.md

---
Source: https://tomesphere.com/paper/PMC12165082