# Comparative performance of PROMIS Sleep Disturbance computerized adaptive testing algorithms and static short form in postmenopausal women

**Authors:** Andrew Trigg, Claudia Haberland, Huda Shalhoub, Christoph Gerlinger, Christian Seitz

PMC · DOI: 10.1186/s41687-025-00849-6 · 2025-02-17

## TL;DR

This study compares different methods for measuring sleep disturbances in older women, finding that both short forms and adaptive testing work well.

## Contribution

The study evaluates the performance of two adaptive testing algorithms and a short form of the PROMIS Sleep Disturbance measure in postmenopausal women.

## Key findings

- The CAT1 algorithm used an average of 4.18 items with slightly lower performance than CAT2 or the short form.
- Both the 8-item short form and adaptive testing with 8 items performed similarly in terms of accuracy.
- Women with sleep disorders reported higher sleep disturbance than those without.

## Abstract

The Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance v1.0 item bank (27 items) measures sleep disturbances. Rather than the full item bank, an 8-item short form (PROMIS SD SF 8b) or computerized adaptive testing (CAT) can be used. This study compares the performance of the PROMIS SD SF 8b with two CAT algorithms in postmenopausal women.

This is a secondary analysis of data collected for the original psychometric testing of the PROMIS Sleep Disturbance item bank, in a sub-sample of women aged ≥55. A graded response model (GRM) was fitted for the item bank, then simulations evaluated the performance of CAT algorithms and the short form, in terms of root mean square error (RMSE) versus the latent trait estimate derived from the full bank. Two CAT algorithms were tested: CAT1 (stop once standard error <0.3 or 12 items administered) and CAT2 (stop once 8 items administered). Convergent and divergent hypotheses for validity were tested through correlations with the Pittsburgh Sleep Quality Index (PSQI) and Epworth Sleepiness Scale (ESS). Known-groups comparisons were made between those with and without self-reported sleep disorder.

A sample of 337 women was analyzed. Unidimensionality and item-level fit to the GRM was supported; however, the local independence assumption was violated. The CAT1 algorithm showed 4.18 items on average, with a minor decrease in performance (higher RMSE value) compared to CAT2 or the PROMIS SD SF 8b. Administering 8 items adaptively (CAT2) compared to fixed (PROMIS SD SF 8b) performed similarly (RMSE difference = 0.001). Reliability exceeded 0.90 across most of the latent trait for all approaches. Correlations with the PSQI and ESS were largely as hypothesized, with minor differences in coefficient values between the approaches (all within 0.05). Women reporting a sleep disorder had greater sleep disturbance than those who did not (p < 0.001 for all).

The results of this study support using the PROMIS Sleep Disturbance item bank in postmenopausal women. The choice of PROMIS SD SF 8b versus CAT can largely be driven by practical reasons (respondent burden and operational complexity) rather than concerns of differential reliability and validity.

The online version contains supplementary material available at 10.1186/s41687-025-00849-6.

## Linked entities

- **Diseases:** sleep disorder (MONDO:0003406)

## Full-text entities

- **Diseases:** Sleep Disturbance (MESH:D012893)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11832987/full.md

---
Source: https://tomesphere.com/paper/PMC11832987