# Discrepancies Between MDT Recommendations and AI-Generated Decisions in Gynecologic Oncology: A Retrospective Comparative Cohort Study

**Authors:** Vasilios Pergialiotis, Nikolaos Thomakos, Vasilios Lygizos, Maria Fanaki, Antonia Varthaliti, Dimitrios Efthymios Vlachos, Dimitrios Haidopoulos

PMC · DOI: 10.3390/cancers18030452 · Cancers · 2026-01-30

## TL;DR

This study compares AI-generated treatment recommendations with those of human experts in gynecologic oncology, finding high agreement in simple cases but more discrepancies in complex ones.

## Contribution

The study evaluates AI's ability to support clinical decision-making in gynecologic oncology, highlighting its strengths and limitations in complex scenarios.

## Key findings

- AI and MDT recommendations showed high concordance in early-stage and standardized treatment cases.
- Discrepancies were more frequent in advanced and recurrent cancers, particularly in staging and multimodal decision-making.
- Vulvar cancer cases had the highest agreement between AI and MDT decisions.

## Abstract

In this study, we compared AI-generated treatment recommendations with MDT decisions across 599 patients with cervical, endometrial, ovarian, and vulvar cancers. AI recommendations were generated using a structured, guideline-driven input format and evaluated across multiple decision domains, including staging, surgical management, and systemic therapy. Overall, concordance between AI and MDT recommendations was high, particularly in early-stage disease and in cancers with more standardized treatment pathways. However, discrepancies were more frequent in advanced and recurrent disease, with staging disagreements being the most common and often influencing downstream treatment recommendations. Discordance was especially notable in ovarian and endometrial cancer, reflecting the complexity of multimodal decision-making and the need to integrate imaging, molecular data, and prior treatments. These findings suggest that while AI tools may effectively support guideline-based decision-making in straightforward scenarios, their limitations become evident in complex cases requiring nuanced clinical judgment. Rather than replacing MDTs, AI systems may be best positioned as collaborative decision-support tools that enhance transparency and consistency while preserving clinician oversight in gynecologic oncology care.

Background: Multidisciplinary tumor boards (MDTs) remain the foundation of gynecologic cancer management, yet increasing diagnostic complexity and rapidly evolving molecular classifications have intensified interest in artificial intelligence (AI) as a potential decision-support tool. This study aimed to evaluate the concordance between MDT-derived recommendations and those generated by ChatGPT 5.0 across a large, real-world cohort of gynecologic oncology cases. Methods: This single-center retrospective analysis included 599 consecutive patients with cervical, endometrial, ovarian, or vulvar cancer evaluated during MDT meetings over a 2-month period. Standardized anonymized case summaries were entered into ChatGPT 5.0, which was instructed to follow current ESGO guidelines. AI-generated staging and treatment recommendations were compared with MDT decisions. Discrepancies were independently assessed by two reviewers and stratified by malignancy type, disease stage, and treatment domain. Results: Overall concordance for FIGO staging was 77.0%, while treatment-related decisions demonstrated lower discordance, particularly in chemotherapy (8.2%) and targeted therapy (6.8%). The highest staging disagreement occurred in early-stage endometrial cancer (32.6%), reflecting the complexity of newly revised molecular classifications. In recurrent ovarian and cervical cancer, discrepancies were more pronounced in surgical and systemic therapy recommendations, suggesting limited AI capacity to integrate multimodal imaging, prior treatments, and individualized considerations. Vulvar cancer cases showed the highest overall agreement. Conclusions: ChatGPT 5.0 aligns with MDT decisions in many straightforward scenarios but falls short in complex or nuanced cases requiring contextual, multimodal, and patient-specific reasoning. These findings underscore the need for prospective, real-time evaluation, multimodal data integration, external validation, and explainable AI frameworks before LLMs can be safely incorporated into routine gynecologic oncology decision-making.

## Linked entities

- **Diseases:** cervical cancer (MONDO:0002974), endometrial cancer (MONDO:0002447), ovarian cancer (MONDO:0005140), vulvar cancer (MONDO:0001528)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), ovarian and cervical cancer (MESH:D010051), endometrial cancer (MESH:D016889), cervical, endometrial, ovarian, or vulvar cancer (MESH:D002575)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12897028/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12897028/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12897028/full.md

---
Source: https://tomesphere.com/paper/PMC12897028