# Artificial intelligence fails to outperform orthopaedic surgeons: A systematic review

**Authors:** Jemima Russell, Jamie Rosen, Martinique Vella‐Baldacchino

PMC · DOI: 10.1002/jeo2.70548 · 2025-11-14

## TL;DR

This study finds that AI does not consistently outperform orthopaedic surgeons in clinical tasks, suggesting it should be used as a supportive tool rather than a replacement.

## Contribution

The novel contribution is a systematic evaluation of AI's performance relative to orthopaedic surgeons across multiple clinical domains.

## Key findings

- AI showed high sensitivity but lower specificity and accuracy compared to surgeons in identifying patient improvements.
- AI scored higher than surgeons in emergency scenarios and patient FAQs, particularly in empathy and completeness.
- Residents outperformed AI in examinations, and AI had limited accuracy in knee osteoarthritis staging.

## Abstract

Artificial intelligence (AI) in orthopaedic surgery is increasingly applied to analyse clinical data, triage patients and interpret imaging with high accuracy. Orthopaedics surgery faces unique challenges, including high patient volumes, complex cases and prolonged waiting lists, highlighting the need for efficiency and decision support. To justify implementation, AI must demonstrate performance comparable to surgeons. This systematic review evaluates AI's performance relative to surgeons to determine its value as a complementary tool in orthopaedic practice.

This systematic review was conducted using OVID Medline. Relevant studies published up to 13 August 2025 were identified. Included studies were categorised into decision making, management plans, clinical knowledge, quality control, and answering patients' frequently asked questions (FAQs).

Of 419 identified studies, 16 were eligible. ChatGPT showed high sensitivity in identifying patients achieving clinically meaningful improvements (97% vs. 90% for surgeons) but lower specificity (33% vs. 63%) and accuracy (65% vs. 76%). AI demonstrates comparable or superior performance to surgeons in emergency scenarios and answering patient FAQs, scoring higher across empathy, accuracy, completeness and overall quality (4.4 vs. 3.5–3.7). Residents outperformed AI in examinations (74.2% vs. 47.2%). AI showed limited accuracy in knee osteoarthritis radiographic staging (35% vs. >80%).

AI demonstrates the potential to support clinical efficiency and patient communication in orthopaedics. However, concerns about bias, quality risks, overconfidence and reliance on outdated information prevent it from replacing human expertise. Clinician‐led design and validation are required to ensure safe and effective integration into clinical practice.

Level IV.

## Full-text entities

- **Diseases:** knee osteoarthritis (MESH:D020370)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12616488/full.md

---
Source: https://tomesphere.com/paper/PMC12616488