# Can Artificial Intelligence Educate Patients? Comparative Analysis of ChatGPT and DeepSeek Models in Meniscus Injuries

**Authors:** Bahri Bozgeyik, Erman Öğümsöğütlü

PMC · DOI: 10.3390/healthcare13222980 · Healthcare · 2025-11-20

## TL;DR

This study compares ChatGPT and DeepSeek AI models in providing patient education on meniscus injuries, finding both useful but DeepSeek more comprehensive.

## Contribution

The study introduces a comparative evaluation of two AI models for patient education on meniscus injuries using clinical and readability metrics.

## Key findings

- DeepSeek outperformed ChatGPT in comprehensiveness and overall content quality.
- Both models produced readable content suitable for general patient education.
- No significant differences were found in accuracy, clarity, or consistency between the models.

## Abstract

Background: Meniscus injuries are among the most common traumatic and degenerative conditions of the knee joint. Patient education plays a critical role in treatment adherence, surgical preparation, and postoperative rehabilitation. The use of artificial intelligence (AI)-based large language models (LLMs) is rapidly increasing in healthcare. This study aimed to compare the quality and readability of responses to frequently asked patient questions about meniscus injuries generated by ChatGPT-5 and DeepSeek R1. Materials and Methods: Twelve frequently asked questions regarding the etiology, symptoms, diagnosis, imaging, and treatment of meniscus injuries were presented to both AI models. The responses were independently evaluated by two experienced orthopedic surgeons using a response rating system and a 4-point Likert scale to assess accuracy, clarity, comprehensiveness, and consistency. Readability was analyzed using the Flesch–Kincaid Reading Ease Score (FRES) and the Flesch–Kincaid Grade Level (FKGL). Interrater reliability was determined using intraclass correlation coefficients (ICCs). Results: DeepSeek performed significantly better than ChatGPT in the response rating system (p = 0.017) and achieved higher scores for comprehensiveness on the 4-point Likert scale (p = 0.005). No significant differences were observed between the two models in terms of accuracy, clarity, or consistency (p > 0.05). Both models produced comparable readability scores (p > 0.05), corresponding to a high-school reading level. Conclusions: Both ChatGPT and DeepSeek show promise as supportive tools for educating patients about meniscus injuries. While DeepSeek demonstrated higher overall content quality, both models generated understandable information suitable for general patient education. Further refinement is needed to improve clarity and accessibility, ensuring that AI-based materials are appropriate for diverse patient populations.

## Full-text entities

- **Diseases:** traumatic (MESH:D014947), Meniscus Injuries (MESH:D000070600), degenerative conditions (MESH:D019636)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12652474/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12652474/full.md

---
Source: https://tomesphere.com/paper/PMC12652474