# The application of pre-trained large visual-language models for preliminary diagnosis of esophageal whitish plaques in large-scale esophageal cancer screening

**Authors:** Yilin Li, Xin Li, Di Zhang, Wenwen Zhu, Yuan Hu, Zijian Zhao, Qi Zhao

PMC · DOI: 10.1038/s41698-026-01301-8 · NPJ Precision Oncology · 2026-01-28

## TL;DR

This paper introduces a computer-aided diagnosis system using a pre-trained visual-language model to improve the accuracy of diagnosing esophageal whitish plaques during cancer screenings.

## Contribution

The novel contribution is applying the BLIP visual-language model to esophageal whitish plaque diagnosis, outperforming existing methods and endoscopists.

## Key findings

- The proposed model outperforms benchmark models in precision, recall, F1 score, and accuracy.
- The model significantly improves keyword accuracy in medical text descriptions compared to LLaVA-Med.
- The system excels in early esophageal cancer case recall, surpassing both senior and junior endoscopists.

## Abstract

Esophageal whitish plaques are common findings in large-scale esophageal cancer screenings, requiring accurate preliminary differentiation to guide appropriate clinical management. This study presents a computer-aided diagnosis (CAD) system based on the pre-trained large-scale visual-language (VL) model BLIP for automated diagnosis and description of esophageal whitish plaques. A dataset of 13,922 endoscopic images was used for model training, and comparative experiments were conducted with multiple benchmark models, including Poolformer, Swin-Transformer, TransMSF, and ViT. The results demonstrate that our approach outperforms existing methods in terms of precision, recall, F1 score, and accuracy. Compared with LLaVA-Med, our model significantly improves keyword accuracy (K-ACC) in medical text descriptions. A human-machine competition further demonstrated that our model outperforms both senior and junior endoscopists, particularly excelling in the recall of early esophageal cancer cases. These findings suggest that integrating pre-trained VL models into CAD systems can enhance the accuracy and efficiency of esophageal whitish plaque diagnosis, reducing misdiagnoses and supporting clinical decision-making.

## Linked entities

- **Diseases:** esophageal cancer (MONDO:0007576)

## Full-text entities

- **Diseases:** esophageal cancer (MESH:D004938)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12953865/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12953865/full.md

## References

7 references — full list in the complete paper: https://tomesphere.com/paper/PMC12953865/full.md

---
Source: https://tomesphere.com/paper/PMC12953865