ChatGPT models provide higher‐quality but lower‐readability responses than Google Gemini regarding anterior shoulder instability, with no added benefit of the orthopaedic expert plugin
Khaled Skaik, Sean Omoseni, Danielle Dagher, Darshil Shah, Theodorakys Marín Fermín, Piero Agostinone, Ashraf Hantouly, Moin Khan

TL;DR
ChatGPT models provide higher-quality but harder-to-read responses about shoulder instability compared to Google Gemini, with no added benefit from an orthopaedic expert plugin.
Contribution
This study compares the quality and readability of medical information on anterior shoulder instability from three large language models.
Findings
ChatGPT 4o and ChatGPT OE provided higher-quality responses than Google Gemini.
Google Gemini's responses were more readable but lower in quality.
The orthopaedic expert plugin did not improve ChatGPT's performance.
Abstract
The purpose is to analyze and compare the quality and readability of information regarding anterior shoulder instability and shoulder stabilization surgery from three LLMs: ChatGPT 4o, ChatGPT Orthopaedic Expert (OE) and Google Gemini. ChatGPT 4o, ChatGPT OE and Google Gemini were used to answer 21 commonly asked questions from patients on anterior shoulder instability. The responses were independently rated by three fellowship‐trained orthopaedic surgeons using the validated Quality Analysis of Medical Artificial Intelligence (QAMAI) tool. Assessors were blinded to the model, and evaluations were performed twice, 3 weeks apart. Readability was measured using Flesch Reading Ease Score (FRES) and Flesch–Kincaid Grade Level (FKGL). This study adhered to TRIPOD‐LLM. Statistical analysis included the Friedman test, the Wilcoxon signed‐rank tests and inter‐class coefficients. Inter‐rater…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Clinical Reasoning and Diagnostic Skills
