# Investigating the Accuracy and Consistency of ChatGPT in the Management of Achilles Tendon Ruptures

**Authors:** Christopha J Knee, Ryan J Campbell, Brahman S Sivakumar, Andrew Wines, Michael J Symes

PMC · DOI: 10.7759/cureus.78433 · Cureus · 2025-02-03

## TL;DR

This study found that ChatGPT provides inconsistent and often inaccurate information about treating Achilles tendon ruptures.

## Contribution

The study is the first to evaluate ChatGPT's reliability in orthopedic clinical decision-making and patient information delivery.

## Key findings

- ChatGPT's responses contained both correct and incorrect information (grade III) for all questions.
- Only 75% of questions showed consistent answers between two responses.
- 34% of ChatGPT's references were correct, while 40% were incorrect and 26% were fabricated.

## Abstract

Background

The emergence of generative artificial intelligence, such as ChatGPT (OpenAI, San Francisco, CA, USA), offers significant potential for improving the delivery of patient information and aiding in clinical decision-making. The aim of this study was to investigate the accuracy and consistency of ChatGPT in providing patient information and answering orthopaedic clinical questions regarding Achilles tendon ruptures.

Methods

Eight questions regarding Achilles tendon rupture management were presented to ChatGPT twice, resulting in 16 responses. References were requested for all responses. Each response was evaluated for accuracy and consistency, utilising a grading scale ranging from I (comprehensive) to IV (completely incorrect). Final grading was determined through consensus discussions among two orthopaedic registrars and two senior orthopaedic surgeons. Descriptive statistics were performed.

Results

All of the responses produced by ChatGPT were graded as containing both correct and incorrect information (grade III). Consistency was observed in six out of eight (75%) questions when comparing the two responses for each question. ChatGPT provided 47 references, with 16 out of 47 (34%) correct, 19 out of 47 (40%) incorrect, and 12 out of 47 (26%) fabricated.

Conclusion

ChatGPT lacks accuracy and consistency in providing information on the management of Achilles tendon ruptures. All patient information and orthopaedic clinical decision-making recommendations contained inaccurate or fabricated information.

## Full-text entities

- **Diseases:** Achilles Tendon Ruptures (MESH:D012421)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11882158/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC11882158/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC11882158/full.md

---
Source: https://tomesphere.com/paper/PMC11882158