# Concordance Between the Multidisciplinary Team and ChatGPT-4o Decisions: A Blinded, Cross-Sectional Concordance Study in Systemic Autoimmune Rheumatic Diseases

**Authors:** Firdevs Ulutaş, Göksel Altınışık, Gülay Güngör, Vefa Çakmak, Nilüfer Yiğit, Duygu Herek, Murat Yiğit, Uğur Karasu, Veli Çobankara

PMC · DOI: 10.3390/diagnostics16010113 · Diagnostics · 2025-12-30

## TL;DR

This study compares decisions made by a medical team and ChatGPT-4o in diagnosing and treating autoimmune rheumatic diseases.

## Contribution

The study evaluates ChatGPT-4o's diagnostic and treatment recommendations against multidisciplinary team decisions in autoimmune rheumatic diseases.

## Key findings

- ChatGPT-4o showed moderate agreement with medical teams in clinical diagnosis and treatment decisions.
- The highest agreement was observed in decisions about drug-free follow-up and immunosuppressive treatment.
- Agreement was lower but still moderate for the need for further investigations.

## Abstract

Background/Objective: In recent years, artificial intelligence (AI) has gained increasing prominence in the fields of diagnostic decision-making in medicine. The aim of this study was to compare multidisciplinary team (MDT: rheumatology, pulmonology, thoracic radiology) decisions with single-session plans generated by ChatGPT-4o. Methods: In this cross-sectional concordance study, adults (≥18 years) with confirmed systemic autoimmune rheumatic disease (SARD) and having MDT decisions within the last 6 months were included. The study documented diagnostic, treatment, and monitoring decisions in cases of SARDs by recording answers to six essential questions: (1) What is the most likely clinical diagnosis? (2) What is the most likely radiological diagnosis? (3) Is there a need for anti-inflammatory treatment? (4) Is there a need for antifibrotic treatment? (5) Is drug-free follow-up appropriate? and (6) Are additional investigations required? Consequently, all evaluations were performed with ChatGPT-4o in a single-session format using a standardized single-prompt template, with the system blinded to MDT decisions. All data analyses in this study were conducted using the R programming language (version 4.3.2). An agreement between AI-generated and MDT decisions was assessed using Cohen’s Kappa (κ) statistic where κ (kappa) values represent the level of agreement: <0.20 = slight, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = substantial, >0.80 = almost perfect agreement. These analyses were performed using the irr and psych packages in R. Statistical significance of the models was evaluated through p-values, while overall model fit was assessed using the Likelihood Ratio Test. Results: A total of 47 patients were involved in this study, with a predominance of female patients (61.70%, n = 29). The mean age was 61.74 ± 10.40 years. The most frequently observed diagnosis was rheumatoid arthritis (RA), accounting for 31.91% of cases (n = 15). This was followed by cases of anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis, interstitial pneumonia with autoimmune features (IPAF), and sarcoidosis. The analyses indicate a statistically significant level of agreement across all decision types. For clinical diagnosis decisions, agreement was moderate (κ = 0.52), suggesting that the AI system can reach partially consistent conclusions in diagnostic processes. The need for an immunosuppressive treatment and follow-up without medication decisions demonstrated a higher level of concordance, reaching the moderate-to-high range (κ = 0.64 and κ = 0.67, respectively). For antifibrotic treatment decisions, agreement was moderate (κ = 0.49), while radiological diagnosis decisions also fell within the moderate range (κ = 0.55). The lowest agreement—though still moderate—was observed in further investigation required decisions (κ = 0.45). Conclusions: In patients with SARDs with pulmonary involvement, particularly in complex cases, concordance was observed between MDT decisions and AI-generated recommendations regarding prioritization of clinical and radiologic diagnoses, treatment selection, suitability for drug-free follow-up, and the need for further diagnostic investigations.

## Linked entities

- **Diseases:** rheumatoid arthritis (MONDO:0008383), sarcoidosis (MONDO:0008399)

## Full-text entities

- **Diseases:** pulmonary involvement (MESH:C566343), inflammatory (MESH:D007249), Autoimmune Rheumatic Diseases (MESH:D012216), vasculitis (MESH:D014657), ANCA (MESH:D056648), RA (MESH:D001172), sarcoidosis (MESH:D012507), interstitial pneumonia with (MESH:D017563)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12785839/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12785839/full.md

---
Source: https://tomesphere.com/paper/PMC12785839