Machine Learning Models Using General and Tissue-Specific Feature Extractors for Accurate Subtyping of Biopsy Samples: Advancing Lung Cancer Diagnosis in Latin America
Viviane Teixeira Loiola de Alencar, Felipe Navarro Balbino Alves, Guilherme de Souza Velozo, Luiz Edmundo Lopes Mizutani, Iusta Caminha, Gabriel Barbosa Silva, Vladmir Cláudio Cordeiro de Lima, Fábio Rocha Fernandes Távora

TL;DR
This paper introduces AI models that improve lung cancer subtype classification in biopsy samples, especially in Latin America where resources are limited.
Contribution
The study introduces two novel DinoV2-based feature extractors, LungDino and OncoDino, tailored for lung cancer subtype classification in diverse and underrepresented regions.
Findings
LungDino and OncoDino outperformed a ResNet baseline in classifying lung cancer subtypes from HE-stained WSIs.
OncoDino showed strong performance in underrepresented categories like small cell carcinoma with an AUC of 0.99.
Both models generated interpretable heatmaps for tumor localization, even in poorly differentiated cases.
Abstract
Lung cancer is the leading cause of cancer-related deaths worldwide, with accurate histologic subtype classification critical for diagnosis and treatment planning. Diagnostic variability and resource disparities, particularly in underrepresented regions such as Latin America, pose substantial challenges. This study developed and evaluated novel artificial intelligence models trained on both global and Latin American pathology samples for subtype classification of hematoxylin and eosin (HE)–stained whole-slide images (WSIs). Two DinoV2-based feature extractors, LungDino and OncoDino, trained on large data sets for task-specific and general pathology applications, were developed. The training data set consisted of 1308 HE-stained WSIs, including 412 adenocarcinomas, 323 squamous cell carcinomas, 41 small cell carcinomas, and 532 benign tissue samples, sourced from The Cancer Genome Atlas…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
