Validation of an AI-based end-to-end model for prostate pathology using long-term archived routine samples

Xiaoyi Ji; Renata Zelic; Oskar Aspegren; Nita Mulliqi; Michelangelo Fiorentino; Francesca Giunchi; Luca Molinaro; Sol Erika Boman; Lorenzo Richiardi; Andreas Pettersson; Per Henrik Vincent; Martin Eklund; Olof Akre; Kimmo Kartasalo

arXiv:2605.02614·cs.CV·May 5, 2026

Validation of an AI-based end-to-end model for prostate pathology using long-term archived routine samples

Xiaoyi Ji, Renata Zelic, Oskar Aspegren, Nita Mulliqi, Michelangelo Fiorentino, Francesca Giunchi, Luca Molinaro, Sol Erika Boman, Lorenzo Richiardi, Andreas Pettersson, Per Henrik Vincent, Martin Eklund, Olof Akre, Kimmo Kartasalo

PDF

TL;DR

This study validates an AI model for prostate pathology that maintains high accuracy and robustness over 17 years of archival samples, comparable to experienced pathologists and useful for prognostic research.

Contribution

It demonstrates the generalizability and stability of an AI-based prostate grading model across diverse regions and long-term archived specimens, highlighting its potential clinical utility.

Findings

01

AI model achieved kappa of 0.86 for grading, comparable to pathologists.

02

Performance remained stable over 17 years, showing robustness to archival variation.

03

AI grades correlated with prostate cancer-specific mortality, indicating prognostic value.

Abstract

Artificial intelligence (AI) is becoming a clinical tool for prostate pathology, but generalization across variations in sample preparation and preservation over prolonged time periods remains poorly understood. We evaluated GleasonAI, an end-to-end attention-based multiple instance learning model, on an independent validation cohort comprising 10,366 biopsy cores from 1,028 patients across 14 Swedish regions, using archival diagnostic specimens from the ProMort cohorts collected between 1998-2015. The model achieved an overall quadratic-weighted kappa of 0.86 for core-level ISUP grading, comparable to several experienced pathologists and consistent across geographic regions. Notably, performance remained stable across the 17-year collection period, demonstrating robustness to time-related variation in archival material, a property not consistently observed with foundation model-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.