# Sample size and power analysis for ROC AUC differences in diagnostic tests: a methodological evaluation of the Obuchowski-McClish and Hanley-McNeil methods

**Authors:** Busra Emir, Fatma Ezgi Can, Elif Kaymaz, Zeynep Ozel, Mehmet Goktug Efgan, Mustafa Agah Tekindal, Ferhan Elmali

PMC · DOI: 10.1186/s12874-026-02768-6 · 2026-01-28

## TL;DR

This paper compares methods for calculating sample sizes in diagnostic test studies, showing how factors like data type and test correlation affect required participant numbers.

## Contribution

The study provides evidence-based guidance on optimal methodological choices for efficient sample size planning in diagnostic accuracy studies.

## Key findings

- Required sample sizes varied from 36 to 3,709 participants per group depending on AUC difference, data type, and correlation.
- Continuous data models outperformed discrete models, requiring 24–53% fewer participants.
- Strong inter-test correlation reduced sample sizes by up to 68% in continuous models.

## Abstract

Sample size determination for area under the curve (AUC) comparisons in diagnostic accuracy studies requires the consideration of multiple methodological parameters. The type of diagnostic test, the nature of the data (discrete or continuous), the correlation structure between tests, and the degree of AUC differences all influence optimal study design and planning. To address these factors, this study provides comprehensive sample size and power calculations for comparing AUCs between diagnostic tests across clinically relevant scenarios.

We conducted a comprehensive evaluation of sample size and power analysis for AUC comparisons under varying correlation levels (ρ = 0.30, 0.50, 0.80), data types (discrete vs. continuous), and AUC differences (ΔAUC = 0.02–0.10). The Obuchowski–McClish method was applied for discrete data, and the Hanley–McNeil approach was applied for continuous data, assuming balanced case-control allocation and two-sided testing (α = 0.05). Sample sizes were calculated to achieve 80% statistical power based on established methodological approaches and adjusted for a 20% anticipated attrition rate. Power curves were generated to illustrate the relationship between sample size and statistical power across the evaluated scenarios.

The required sample sizes varied substantially across scenarios, ranging from 36 − 3,709 participants per group to achieve 80% power. The degree of difference in the area under the curve (AUC) was the primary determinant: ΔAUC = 0.02 required 909-3,709 participants, whereas a ΔAUC = 0.10 required only 36–142 participants. Inter-test correlation also had a marked impact on efficiency, with a strong correlation (ρ = 0.8) reducing sample sizes by 49% in discrete models and 68% in continuous models compared with a weak correlation (ρ = 0.3). Continuous data models consistently outperformed discrete models, requiring 24–53% fewer participants across all the scenarios. The most demanding scenario (ΔAUC = 0.02, discrete data, ρ = 0.3) required 3,709 participants per group, whereas the most efficient scenario (ΔAUC = 0.10, continuous data, ρ = 0.8) required only 36 participants, representing a 103-fold difference.

Methodological choices lead to substantial variations in sample size requirements for diagnostic accuracy studies. Optimal parameter selection, particularly the use of continuous data models and accounting for strong inter-test correlations, can reduce the required sample sizes by up to 68% compared with suboptimal combinations. These results provide evidence-based guidance for efficient planning of diagnostic accuracy studies and underscore the critical importance of methodological considerations in study design optimization.

The online version contains supplementary material available at 10.1186/s12874-026-02768-6.

## Full-text entities

- **Genes:** MUC16 (mucin 16, cell surface associated) [NCBI Gene 94025] {aka CA125}, GTF2E1 (general transcription factor IIE subunit 1) [NCBI Gene 2960] {aka FE, TF2E1, TFIIE-A}, MAPT (microtubule associated protein tau) [NCBI Gene 4137] {aka DDPAC, FTD1, FTDP-17, MAPTL, MSTD, MTBT1}, MAT1A (methionine adenosyltransferase 1A) [NCBI Gene 4143] {aka MAT, MATA1, SAMS, SAMS1}, FPR1 (formyl peptide receptor 1) [NCBI Gene 2357] {aka FMLP, FPR}
- **Diseases:** Ovarian Cancer (MESH:D010051), DISS (MESH:D015875), ALS (MESH:D000690), rare diseases (MESH:D035583), pulmonary embolism (MESH:D011655), Alzheimer's disease (MESH:D000544), lung cancer (MESH:D008175), traumatic brain injury (MESH:D000070642)
- **Chemicals:** AP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12924612/full.md

---
Source: https://tomesphere.com/paper/PMC12924612