# A statistical approach to automated analysis of the low‐contrast object detectability test for the large ACR MRI phantom

**Authors:** Ali M. Golestani, Julia M. Gee

PMC · DOI: 10.1002/acm2.70173 · 2025-07-14

## TL;DR

This paper introduces an automated method for MRI quality control testing that matches human performance and reduces variability.

## Contribution

A novel statistical Python-based automated method for low-contrast object detectability testing in MRI with high agreement to human raters.

## Key findings

- The automated method achieved perfect intra-rater agreement and high inter-rater agreement with human raters.
- The method showed consistent performance across T1- and T2-weighted images.
- Stress tests confirmed the method's reliability and suitability for clinical use.

## Abstract

Regular quality control checks are essential to ensure the quality of MRI systems. The American College of Radiology (ACR) has developed a standardized large phantom test protocol for this purpose. However, the ACR protocol recommends manual measurements, which are time‐consuming, labor‐intensive, and prone to variability, impacting accuracy and reproducibility. Although some aspects of the ACR evaluation have been automated or semi‐automated, tests like low‐contrast object detectability (LCOD), remain challenging to automate. LCOD involves assessing the visibility of objects at various contrast levels.

The purpose of this research is to propose and evaluate an automated approach for LCOD testing in MRI.

The automated Python code generates a one‐dimensional profile of image intensities along radial paths from the center of the contrast disk. These profiles are compared to templates created from the disc's geometric information using general linear model statistical tests. A total of 80 image volumes (40 T1‐ and 40 T2‐weighted) were assessed twice by two human evaluators and the proposed Python code.

Human raters showed intra‐rater variability (Cohen's Kappa 0.941, 0.962), while the Python code exhibited perfect intra‐rater agreement. Inter‐rater agreement between the code and humans was comparable to human‐to‐human agreement (Cohen's Kappa 0.878 between the two human raters vs. 0.945, and 0.783 between the code and human raters). A stress test revealed both human raters and the code assigned higher scores to lower bandwidth images and lower scores to higher bandwidth images.

The proposed automated method eliminates intra‐rater variability and achieves strong inter‐rater agreement with human raters. These findings suggest the method is reliable and suitable for clinical settings, showing high concordance with human assessments.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12257337/full.md

---
Source: https://tomesphere.com/paper/PMC12257337