# Evaluating ChatGPT-5’s Performance in Answering Common Patient Questions About Femoroacetabular Impingement and Hip Arthroscopy

**Authors:** Maximilian Voss, Hannah Jaeger, Mikhail Salzmann, Robert Prill, Timoty Osterberger, Ingo J. Banke, Nikolai Ramadanov

PMC · DOI: 10.1007/s43465-026-01696-3 · 2026-02-02

## TL;DR

This study shows that ChatGPT-5 provides accurate and clear answers to patient questions about hip conditions and surgery, making it a useful educational tool in orthopedics.

## Contribution

The study evaluates ChatGPT-5's performance on FAIS and HAS, showing improvements over earlier models in accuracy and completeness.

## Key findings

- ChatGPT-5 received high scores for accuracy, clarity, and relevance in answering patient questions about FAIS and HAS.
- Inter-rater reliability was moderate to excellent, with high agreement between two orthopedic surgeons.
- Responses were free of factual errors, but some were brief, slightly affecting completeness.

## Abstract

Hip arthroscopy (HAS) is widely used to treat femoroacetabular impingement syndrome (FAIS), and many patients rely on online resources for medical information. Large language models (LLMs) such as ChatGPT have shown potential as supplementary educational tools in orthopedics; however, existing evaluations are limited to earlier model generations with variable accuracy and completeness. This study aimed to evaluate the accuracy, clarity, relevance, and completeness of ChatGPT-5 responses to common patient questions regarding FAIS and HAS.

ChatGPT-5 was used to generate 25 frequently asked patient questions and corresponding answers related to hip preservation. Two fellowship-trained hip preservation surgeons independently evaluated each response using a five-point Likert scale across four predefined domains: relevance, accuracy, clarity, and completeness. Descriptive statistics were calculated as mean ± standard deviation for each domain. Inter-rater reliability was assessed using a two-way random-effects intraclass correlation coefficient with absolute agreement (ICC [2, 1]) and complemented by exact agreement percentages.

All responses received excellent scores, with mean values ranging from 4.84 ± 0.27 (completeness) to 5.00 ± 0.00 (relevance). Accuracy (4.97 ± 0.08) and clarity (4.91 ± 0.17) were near-perfect. ICC values demonstrated moderate to excellent agreement (0.70–0.81), complemented by high exact agreement rates (84–100%). No answer contained factually incorrect, misleading, or unsafe information. Minor reductions in completeness were attributable to occasional brevity rather than substantive omissions.

ChatGPT-5 generated highly accurate, clear, and clinically appropriate patient-oriented explanations regarding FAIS and HAS, showing clear improvement compared with earlier ChatGPT versions. Although ChatGPT-5 represents a marked advancement in AI-based patient education, its use should be regarded as a complementary educational tool rather than a replacement for professional orthopedic counseling.

## Full-text entities

- **Diseases:** FAIS (MESH:D057925)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13031688/full.md

---
Source: https://tomesphere.com/paper/PMC13031688