# Tests of large language models' medical competence and application for clinical decision support of musculoskeletal rehabilitation

**Authors:** Ruikang Liu, Qiaoling Liu, Qiang Hu, Ruixing Nan, Jian He, Jinru Yang, Jialin Zhang, Guang Yang, Zhaohui Yang, Xiling Xiao, Xiaoxuan Xia, Yongchao Wu

PMC · DOI: 10.3389/fdgth.2025.1719340 · Frontiers in Digital Health · 2026-02-10

## TL;DR

This study evaluates how well large language models perform in musculoskeletal rehabilitation tasks and finds that some models, like Doubao 1.5 pro, can support clinical decision-making.

## Contribution

The study identifies top-performing LLMs for musculoskeletal rehabilitation and shows that local-language models outperform English ones in localized contexts.

## Key findings

- Doubao 1.5 pro achieved high accuracy in musculoskeletal rehabilitation tasks.
- Chinese LLMs had fewer incorrect answers than English LLMs in localized tests.
- LLMs improved therapists' accuracy in preparing for rehabilitation examinations.

## Abstract

Large language models (LLMs) are currently abundant and diverse, yet clinicians lack clarity on top performers, with uncertainty about general LLMs' expertise in musculoskeletal rehabilitation. This study aims to investigate the potential and correctness of LLMs in clinical application, and to evaluate whether LLMs could assist primary rehabilitation therapists to prepare for rehabilitation examination.

8 primary doctors and therapists tested 10 LLMs in the first test, 5 senior doctors and therapists assessed answers in the second test, and 5 primary therapists acted as examinees in the third test. We assessed the quality of case analysis based on six different dimensions, including Case Understanding, Clinical Reasoning, Primary Diagnosis, Differential Diagnosis, Treatment Plan Accuracy and Safety, and Guidelines & Consensus.

In the first test, only ERNIE Bot X1 Turbo and Doubao 1.5 pro had accuracy rates of over 90%, and Chinese LLMs had significantly fewer incorrect questions than English LLMs (9.6% vs. 14.8%, P < 0.001). In the second test, Doubao 1.5 pro achieved relatively high scores in both cases, and LLMs gained high scores in “Case understanding”, “Clinical Reasoning” and “Diagnosis”. In the third test, primary therapists achieving a mean accuracy rate of 76.9%, and Doubao 1.5 pro improved its accuracy rates to 85.8%.

Doubao 1.5 pro possessed competent ability and application prospects, and was assessed as the best LLM for answering musculoskeletal rehabilitation questions. We also demonstrated that the response quality of local-language LLMs was significantly better than that of English LLMs in answering localized language questions.

## Full-text entities

- **Diseases:** cervical spondylotic radiculopathy (MESH:D011843), LLMs (MESH:D007806), lumbar disc herniation (MESH:C535531), post-fracture (MESH:D000094025), musculoskeletal diseases (MESH:D009140), liver cancer (MESH:D006528)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12929487/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12929487/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12929487/full.md

---
Source: https://tomesphere.com/paper/PMC12929487