# The sensitivity of patient-reported outcome measures in surgical and non-surgical care: a systematic review and meta-epidemiological evaluation of randomised controlled trials

**Authors:** Mikko Uimonen, Matias Vaajala, Antti Saarinen, Rasmus Liukkonen, Oskari Pakarinen, Juho Laaksonen, Ville Ponkilainen, Ilari Kuitunen, Valtteri Panula

PMC · DOI: 10.1016/j.eclinm.2026.103776 · 2026-01-29

## TL;DR

This study finds that patient-reported outcome measures often fail to detect meaningful differences between surgical and non-surgical treatments due to ceiling effects in their scoring scales.

## Contribution

The study introduces a meta-epidemiological evaluation of PROM sensitivity in RCTs, revealing how score distribution biases impact detection of clinical differences.

## Key findings

- The mean likelihood of detecting a 10-point difference between surgical and non-surgical groups was 19%.
- Detection likelihood peaked at 35% for a mean PROM score of 70 and declined at scale extremes.
- Significant observed differences had a 54% detection likelihood, compared to 17% for non-significant comparisons.

## Abstract

Accumulation of score distribution towards the high end of the measurement scale is an important source of bias related patient-reported outcome measures (PROM). The aim was to evaluate how PROM score distributions, scale boundaries, and sampling variability influence the likelihood of detecting a minimal clinically important difference (MCID) of 10 points between surgical and non-surgical groups in randomised controlled trials (RCTs) of musculoskeletal disorders.

We did a systematic review and meta-epidemiological analysis of 129 RCT studies comparing surgical and non-surgical interventions in patients with musculoskeletal complaints using a PROM as an outcome measure (1771 group-level PROM measurements) from PubMed and Scopus published until February 26, 2025. Simulations assessed each comparison's likelihood of detecting a difference of 10 points or more.

The mean difference between groups was 4.6 (SD 7.1) points favouring surgery, with surgical arms scoring higher in 72% of comparisons. The mean likelihood of detecting at least a 10-point difference was 19%, meaning fewer than one in five of such comparisons would detect a true difference. Detection likelihood peaked (35%) at a mean score of 70, declining toward scale extremes. Comparisons with significant observed differences (>10 points, p < 0.05) had a 54% likelihood versus 17% in non-significant comparisons, strongly linking detection likelihood to observed differences.

The majority of the PROM-based RCTs were unlikely to detect differences due to ceiling effects with a constant underestimation of surgical benefit. PROMs with adequate content coverage, better discrimination, and reduced ceiling susceptibility should be selected for clinical practice. Future research should align outcome selection and follow-up timing with expected treatment effects and ensure that measurement properties do not mask meaningful clinical differences.

None.

## Full-text entities

- **Diseases:** musculoskeletal complaints (MESH:D009140)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12874346/full.md

---
Source: https://tomesphere.com/paper/PMC12874346