# Intelligence without intuition: a mixed-methods pilot study on reasoning models in musculoskeletal physiotherapy for low-back pain

**Authors:** Ricardo Knauer, Matthias Kalmring, Erik Rodner

PMC · DOI: 10.3389/fdgth.2026.1789412 · Frontiers in Digital Health · 2026-03-18

## TL;DR

This study evaluates how well reasoning models can support clinical decision-making in low-back pain treatment, finding them reliable but lacking in empathy and intuition.

## Contribution

The study introduces a multidimensional framework for evaluating reasoning models in musculoskeletal physiotherapy, highlighting their strengths and limitations in clinical reasoning.

## Key findings

- State-of-the-art reasoning models show sufficient reliability and competence in conceptual reasoning and completeness.
- Qualitative analysis reveals weaknesses in logical coherence, patient-centeredness, empathy, and intuition.
- The study provides guidance for model selection and prompting strategies to improve clinical reasoning performance.

## Abstract

Musculoskeletal pain, especially low-back pain, is highly prevalent and often challenging to manage due to its multifactorial nature. Effective diagnosis and therapy require clinicians to integrate biopsychosocial information within an evidence-based clinical reasoning framework. Large language models that “think” before responding, so-called reasoning models, show promise to support such complex decision-making, yet their validity and reliability in this setting remain unclear. In our work, we present a comprehensive human evaluation of reasoning models for clinical reasoning. Our results indicate that state-of-the art reasoning models demonstrate sufficient test–retest reliability and are competent or proficient in terms of their conceptual reasoning, completeness, correctness, relevance, and usefulness, with no statistically significant or clinically relevant differences between them. However, our qualitative analysis reveals weaknesses in logical coherence, patient-centeredness, empathy, and intuition, with most deviations from expert reasoning in the domain of intuition. Our findings underscore the importance of adopting a multidimensional framework for evaluating language model outputs and allow us to provide guidance for model selection and prompting strategies to enhance clinical reasoning performance.

## Full-text entities

- **Diseases:** Musculoskeletal pain (MESH:D059352), low-back pain (MESH:D017116)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13038865/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13038865/full.md

## References

110 references — full list in the complete paper: https://tomesphere.com/paper/PMC13038865/full.md

---
Source: https://tomesphere.com/paper/PMC13038865