# Artificial Intelligence Applications for Automated Data Extraction and Secondary Use of Clinical Information in Uro-oncology: A Systematic Review

**Authors:** Julian Greß, Gordon Otto, Sebastian Sommer, Markus K. Schuler, Shahbaz Khan, Florian Schröder, Christoph Seidel

PMC · DOI: 10.1016/j.euros.2026.02.006 · 2026-02-26

## TL;DR

AI can accurately extract clinical data from uro-oncology records, but most systems lack validation and are not yet ready for widespread clinical use.

## Contribution

This systematic review identifies key limitations in AI-driven clinical data extraction in uro-oncology and proposes standards for reliable deployment.

## Key findings

- AI models achieve high accuracy (F1 > 0.90) in structured data extraction from uro-oncology documents.
- 86% of studies rely on internal validation, with limited external validation or clinical readiness assessments.
- Most AI systems are single-center and lack transparency in calibration and implementation frameworks.

## Abstract

Artificial intelligence systems can extract clinical information from uro-oncology documents with consistently high technical performance, frequently achieving F1 scores above 0.90 and demonstrating meaningful efficiency gains. However, this systematic review reveals that clinical readiness remains limited: (1) 85% of models lack any external validation, (2) calibration and interpretability are rarely reported, and (3) studies are predominantly single center, retrospective, and methodologically heterogeneous. This disconnect between accuracy and trustworthiness underscores the need for a decisive shift toward rigorous external and temporal validation, human-in-the-loop verification, transparent calibration reporting, fairness assessments, and implementation-science frameworks. Only through such standards can artificial intelligence–driven data extraction become reliable, safe, and deployable in routine uro-oncological care.

Manual data extraction is a major bottleneck in uro-oncology, limiting research and quality assurance. Although artificial intelligence (AI) offers a scalable solution, the quality and generalizability of current evaluations remain unclear. This review aims to assess the performance, validation strategies, and real-world implementation of AI for automated data extraction in uro-oncology, encompassing a methodological spectrum from rule-based natural language processing to large language models, and to provide recommendations for rigorous evaluation standards.

A systematic search of PubMed, Web of Science, and Embase was conducted through May 2025 following the Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines. The search was restricted to studies published from 2020 onward to focus on modern AI capabilities. Two reviewers independently screened records, extracted data, and assessed risk of bias using the Prediction model Risk Of Bias Assessment Tool (PROBAST).

Fourteen studies, encompassing between 100 and 66 532 patient records and approximately 120 000 individual clinical documents across genitourinary cancers, were included. AI models demonstrated high technical performance on structured data extraction, with reported F1 scores frequently exceeding 0.90. However, 86% (12/14) relied solely on internal validation; only two studies reported external validation. Nine studies (64%) described workflow benefits such as improved efficiency and reduced manual abstraction time. Most studies were retrospective and single center, with heterogeneous reporting that precluded a meta-analysis. Evidence for clinical application, cost effectiveness, calibration, and long-term sustainability was limited. These limitations highlight the need for robust external validation, human-in-the-loop verification, improved calibration reporting, equity assessments, and an implementation-science approach.

AI shows strong potential for automating data extraction in uro-oncology, but clinical translation is limited by insufficient external validation and methodological heterogeneity. A shift from isolated performance metrics toward demonstrated robustness and trustworthy clinical application is needed to support reliable clinical use.

In this study, we reviewed how artificial intelligence (AI) is being used to extract information automatically from medical reports on urological cancers. We found that most AI systems can identify important clinical details very accurately, but these are usually tested in only one hospital and not yet shown to work reliably in other settings. This means that while AI has great potential to save time and improve data quality, more testing in everyday clinical practice is needed before it can be used safely and routinely.

## Full-text entities

- **Genes:** KLK3 (kallikrein related peptidase 3) [NCBI Gene 354] {aka APS, KLK2A1, PSA, hK3}, TENM1 (teneurin transmembrane protein 1) [NCBI Gene 10178] {aka ODZ1, ODZ3, TEN-M1, TEN1, TNM, TNM1}, NINL (ninein like) [NCBI Gene 22981] {aka NLP}
- **Diseases:** androgen (MESH:D014770), tumor, node, metastasis (MESH:D008207), uro-oncological malignancies (MESH:D000072716), RCC (MESH:D002292), Urinary Bladder Neoplasms (MESH:D001749), metastasis (MESH:D009362), genitourinary cancers (MESH:D014565), hallucinations (MESH:D006212), Urologic Neoplasms (MESH:D014571), muscle-invasive disease (MESH:D000093284), AI (MESH:C538142), LLMs (MESH:D007806), ML (MESH:D007859), Testicular Neoplasms (MESH:D013736), ischemia (MESH:D007511), Kidney Neoplasms (MESH:D007680), node (MESH:D012804), Prostatic Neoplasms (MESH:D011471), PC (MESH:D015324), cancer (MESH:D009369), blood loss (MESH:D016063)
- **Chemicals:** ADT (-), testosterone (MESH:D013739), T (MESH:D014316)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962141/full.md

---
Source: https://tomesphere.com/paper/PMC12962141