# Evaluation of Artificial Intelligence as a Decision-Support Tool in Urological Tumor Boards: A Study in Real Clinical Practice

**Authors:** Javier De la Torre-Trillo, Yaiza Yáñez Castillo, Maria Teresa Melgarejo Segura, Elisa Carmona Sánchez, Alberto Zambudio Munuera, Juan Mora-Delgado, Alfonso López Luque

PMC · DOI: 10.3390/jcm15062130 · 2026-03-11

## TL;DR

This study evaluates how well ChatGPT-4o aligns with expert decisions in urological tumor boards, finding moderate agreement but limitations in complex cases.

## Contribution

The study is one of the first to assess AI decision support in real-world urologic tumor board settings using a large clinical case set.

## Key findings

- ChatGPT-4o agreed fully with tumor board decisions in 56.1% of cases.
- Discrepancies were most frequent in metastatic prostate cancer cases.
- Highest agreement occurred in bladder and renal tumors and standardized treatment scenarios.

## Abstract

Background/Objectives: Artificial intelligence (AI) tools, particularly large language models (LLMs) such as ChatGPT-4o, are gaining prominence in medicine. While their diagnostic capabilities have been explored across various oncologic domains, their role in clinical decision-making within multidisciplinary tumor boards (MTBs) remains largely unexamined in urologic oncology. This study evaluates the performance of ChatGPT-4o as a decision-support tool in a real-world MTB setting by comparing its recommendations with those of expert clinicians. Materials and Methods: A retrospective study was conducted using 98 anonymized clinical cases discussed by a urologic MTB between June 2024 and February 2025. An independent urologist entered the same cases into ChatGPT-4o using a standardized prompt replicating real-world presentation. Two certified urologists independently assessed the model’s responses. Agreement was analyzed overall and by tumor type, disease stage, clinical context, and treatment strategy. Results: ChatGPT-4o fully agreed with the MTB in 56.1% of cases, was correct but incomplete in 23.5%, and provided partially accurate but flawed recommendations in 18.4%. Overall concordance between ChatGPT-4o and the MTB yielded a Cohen’s kappa of 0.61, indicating moderate-to-good agreement. Discrepancies were most common in metastatic prostate cancer, often due to misclassification of tumor burden or errors in treatment sequencing. Highest agreement rates were observed in bladder and renal tumors, and in standardized therapeutic scenarios such as radiotherapy. Conclusions: ChatGPT-4o demonstrated moderate alignment with expert MTB decisions and performed best in well-defined clinical contexts. While it cannot replace multidisciplinary expertise, it may serve as a supportive tool to enhance access to standardized oncologic care.

## Linked entities

- **Diseases:** prostate cancer (MONDO:0005159), bladder cancer (MONDO:0004986), renal tumors (MONDO:0021163)

## Full-text entities

- **Diseases:** oncologic (MESH:D000072716), bladder and renal tumors (MESH:D001749), prostate cancer (MESH:D011471), Tumor (MESH:D009369)

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13027246/full.md

---
Source: https://tomesphere.com/paper/PMC13027246