Machine Learning Models Using General and Tissue-Specific Feature Extractors for Accurate Subtyping of Biopsy Samples: Advancing Lung Cancer Diagnosis in Latin America

Viviane Teixeira Loiola de Alencar; Felipe Navarro Balbino Alves; Guilherme de Souza Velozo; Luiz Edmundo Lopes Mizutani; Iusta Caminha; Gabriel Barbosa Silva; Vladmir Cláudio Cordeiro de Lima; Fábio Rocha Fernandes Távora

PMC · DOI:10.1016/j.jtocrr.2025.100906·September 18, 2025

Machine Learning Models Using General and Tissue-Specific Feature Extractors for Accurate Subtyping of Biopsy Samples: Advancing Lung Cancer Diagnosis in Latin America

Viviane Teixeira Loiola de Alencar, Felipe Navarro Balbino Alves, Guilherme de Souza Velozo, Luiz Edmundo Lopes Mizutani, Iusta Caminha, Gabriel Barbosa Silva, Vladmir Cláudio Cordeiro de Lima, Fábio Rocha Fernandes Távora

PDF

Open Access

TL;DR

This paper introduces AI models that improve lung cancer subtype classification in biopsy samples, especially in Latin America where resources are limited.

Contribution

The study introduces two novel DinoV2-based feature extractors, LungDino and OncoDino, tailored for lung cancer subtype classification in diverse and underrepresented regions.

Findings

01

LungDino and OncoDino outperformed a ResNet baseline in classifying lung cancer subtypes from HE-stained WSIs.

02

OncoDino showed strong performance in underrepresented categories like small cell carcinoma with an AUC of 0.99.

03

Both models generated interpretable heatmaps for tumor localization, even in poorly differentiated cases.

Abstract

Lung cancer is the leading cause of cancer-related deaths worldwide, with accurate histologic subtype classification critical for diagnosis and treatment planning. Diagnostic variability and resource disparities, particularly in underrepresented regions such as Latin America, pose substantial challenges. This study developed and evaluated novel artificial intelligence models trained on both global and Latin American pathology samples for subtype classification of hematoxylin and eosin (HE)–stained whole-slide images (WSIs). Two DinoV2-based feature extractors, LungDino and OncoDino, trained on large data sets for task-specific and general pathology applications, were developed. The training data set consisted of 1308 HE-stained WSIs, including 412 adenocarcinomas, 323 squamous cell carcinomas, 41 small cell carcinomas, and 532 benign tissue samples, sourced from The Cancer Genome Atlas…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

HE

Diseases5

lung cancer adenocarcinoma squamous cell carcinoma small cell carcinoma Cancer

Figures6

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging