# When AI joins the table: evaluating large language model performance in soft tissue sarcoma tumor board decisions

**Authors:** Reza Dehdab, Saif Afat, Fiona Mankertz, Jan Michael Brendel, Nour Maalouf, Sebastian Werner, Andreas Brendlin, Judith Herrmann, Konstantin Nikolaou, Linus D. Kloker, Branko Calukovic, Katrin Benzler, Lars Zender, Christoph K. W. Deinzer

PMC · DOI: 10.1007/s00432-026-06432-w · Journal of Cancer Research and Clinical Oncology · 2026-02-27

## TL;DR

This study evaluates how well ChatGPT-4o can support tumor board decisions for soft tissue sarcoma, finding it performs well in some areas but needs expert oversight.

## Contribution

The novel contribution is a direct evaluation of ChatGPT-4o's performance in generating treatment recommendations for soft tissue sarcoma compared to expert tumor board decisions.

## Key findings

- ChatGPT-4o scores were significantly lower than the maximum achievable value across all five evaluation domains.
- Clinical contextualization was the strongest domain for ChatGPT-4o compared to other criteria.
- No significant performance differences were observed across different sarcoma subtypes.

## Abstract

Multidisciplinary tumor boards (MDTs) are critical for the personalized management of soft tissue sarcomas (STS), but they are limited by time, costs, and resource demands. With recent advances in large language models (LLMs) like ChatGPT, there is growing interest in evaluating their potential role in augmenting MDT workflows. This study aimed to assess the clinical performance of ChatGPT-4o in real-world STS cases using predefined evaluation criteria, comparing its treatment suggestions with expert MDT decisions.

This retrospective study included 152 patients presented to the multidisciplinary sarcoma tumor board. ChatGPT-4o was prompted to generate guideline-based treatment recommendations based on anonymized tumor board registration letters. Outputs were scored by blinded expert reviewers using a five-domain framework: diagnostic modalities, therapeutic modalities, treatment sequencing/timing, chemotherapy regimen, and clinical contextualization. Descriptive statistics and non-parametric ANOVA with post hoc tests assessed performance, including subgroup analysis by sarcoma subtype.

ChatGPT-4o scores were significantly lower than the maximum achievable value of 1.0 across all five criteria (all p < 0.0001). Among individual domains, clinical contextualization significantly outperformed all other criteria in pairwise comparisons (all p < 0.05). No significant performance differences were observed across sarcoma subtypes (H = 19.74, p = 0.138).

ChatGPT-4o demonstrated substantial expert-rated performance in generating tumor board recommendations for soft tissue sarcoma cases, particularly excelling in personalized contextualization. Discrepancies in treatment sequencing and chemotherapy selection highlight the need for expert oversight. These findings support the feasibility of LLM integration into oncology workflows, warranting further refinement toward safe, supportive clinical use.

The online version contains supplementary material available at 10.1007/s00432-026-06432-w.

## Linked entities

- **Diseases:** soft tissue sarcoma (MONDO:0018078)

## Full-text entities

- **Genes:** NINL (ninein like) [NCBI Gene 22981] {aka NLP}, DDIT3 (DNA damage inducible transcript 3) [NCBI Gene 1649] {aka AltDDIT3, C/EBPzeta, CEBPZ, CHOP, CHOP-10, CHOP10}, MDM2 (MDM2 proto-oncogene) [NCBI Gene 4193] {aka ACTFS, HDMX, LSKB, hdm2}
- **Diseases:** breast cancer tumor (MESH:D001943), myxoid liposarcoma (MESH:D018208), bone sarcomas (MESH:D001847), toxicity (MESH:D064420), hallucination (MESH:D006212), GIST (MESH:D046152), metastasis (MESH:D009362), STS (MESH:D012509), frailty (MESH:D000073496), LLMs (MESH:D007806), Cancer (MESH:D009369), rectal carcinoma (MESH:D012004), leiomyosarcoma (MESH:D007890), dedifferentiated liposarcoma (MESH:D008080), bone tumors (MESH:D001859)
- **Chemicals:** Nivolumab (MESH:D000077594), Ifosfamide (MESH:D007069), Boehringer (-), Doxorubicin (MESH:D004317), Ipilimumab (MESH:D000074324)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12948744/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12948744/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12948744/full.md

---
Source: https://tomesphere.com/paper/PMC12948744