An Assessment of the Performance of Different Chatbots on Shoulder and Elbow Questions
Mohamad Y. Fares, Tarishi Parmar, Peter Boufadel, Mohammad Daher, Jonathan Berg, Austin Witt, Brian W. Hill, John G. Horneff, Adam Z. Khan, Joseph A. Abboud

TL;DR
This study evaluates how well AI chatbots answer shoulder and elbow surgery questions, finding that while they perform reasonably, human experts still outperform them.
Contribution
The study is one of the first to assess AI chatbots specifically for shoulder and elbow surgery education.
Findings
GPT-4o had the highest accuracy at 74%, but all chatbots averaged 60.4%.
Chatbots performed worse on complex topics like nerve injuries and hard questions.
AAOS members outperformed chatbots with 79.4% accuracy.
Abstract
Background/Objectives: The utility of artificial intelligence (AI) in medical education has recently garnered significant interest, with several studies exploring its applications across various educational domains; however, its role in orthopedic education, particularly in shoulder and elbow surgery, remains scarcely studied. This study aims to evaluate the performance of multiple AI models in answering shoulder- and elbow-related questions from the AAOS ResStudy question bank. Methods: A total of 50 shoulder- and elbow-related questions from the AAOS ResStudy question bank were selected for the study. Questions were categorized according to anatomical location, topic, concept, and difficulty. Each question, along with the possible multiple-choice answers, was provided to each chatbot. The performance of each chatbot was recorded and analyzed to identify significant differences between…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · AI in Service Interactions · COVID-19 diagnosis using AI
