Ancestry-Associated Performance Variability of Open-Source AI Models for EGFR Prediction in Lung Cancer

Mehrdad Rakaee; Amin H. Nassar; Masoud Tafavvoghi; Falah Jabar; Elias Bou Farhat; Elio Adib; Sigve Andersen; Lill-Tove Rasmussen Busund; Mette Pøhl; Åslaug Helland; Alexander Gusev; Biagio Ricciuti; Lynette M. Sholl; Tom Donnem; David J. Kwiatkowski

PMC · DOI:10.1001/jamaoncol.2025.6430·February 12, 2026

Ancestry-Associated Performance Variability of Open-Source AI Models for EGFR Prediction in Lung Cancer

Mehrdad Rakaee, Amin H. Nassar, Masoud Tafavvoghi, Falah Jabar, Elias Bou Farhat, Elio Adib, Sigve Andersen, Lill-Tove Rasmussen Busund, Mette Pøhl, Åslaug Helland, Alexander Gusev, Biagio Ricciuti, Lynette M. Sholl, Tom Donnem, David J. Kwiatkowski

PDF

Open Access

TL;DR

This study shows that open-source AI models for predicting EGFR mutations in lung cancer work well overall but have lower accuracy for Asian patients and pleural tissue samples.

Contribution

The study reveals ancestry-related performance variability in AI models for EGFR prediction, highlighting the need for recalibration in diverse populations.

Findings

01

AI models achieved high accuracy for EGFR prediction but showed lower performance in Asian ancestry subgroups.

02

Performance declined in pleural tissue samples compared to lung specimens.

03

AI triage could reduce rapid EGFR testing by 57% while maintaining high sensitivity and specificity.

Abstract

Do open-source artificial intelligence (AI) models for predicting EGFR mutations from pathology slides perform consistently across patient populations and clinical settings? In this multicohort study of 2098 patients with lung adenocarcinoma from the US and Europe, open-source AI approaches achieved high accuracy for EGFR prediction and demonstrated overall robust performance. Subgroup analyses revealed lower accuracy in Asian patients and pleural tissue samples. AI-based histology tools show strong potential as rapid, low-cost adjuncts for identifying EGFR mutations; broader validation and recalibration across diverse populations and tissue types will help ensure equitable clinical adoption and maximize their impact in cancer care. This cohort study evaluated the performance and generalizability of 2 open-source artifical intelligence models in predicting mutations in EGFR genes in…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

EGFR

Proteins1

Species1

Homo sapiens(human · species)

Chemicals2

hematoxylin eosin

Diseases5

lung adenocarcinoma lung cancer LUAD Cancer TNM-I

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Lung Cancer Treatments and Mutations · Lung Cancer Diagnosis and Treatment