# Assessing Change in Stone Burden on Baseline and Follow-Up CT: Radiologist and Radiomics Evaluations

**Authors:** Parisa Kaviani, Matthias F. Froelich, Bernardo Bizzo, Andrew Primak, Giridhar Dasegowda, Emiliano Garza-Frias, Lina Karout, Anushree Burade, Seyedehelaheh Hosseini, Javier Eduardo Contreras Yametti, Keith Dreyer, Sanjay Saini, Mannudeep Kalra

PMC · DOI: 10.3390/jimaging12010013 · Journal of Imaging · 2025-12-27

## TL;DR

This study compares AI-based volumetric analysis with radiologist assessments and radiomics to evaluate changes in kidney stone burden using CT scans.

## Contribution

The study demonstrates that automated threshold-based volumetric quantification outperforms qualitative and radiomics-based methods for assessing kidney stone burden changes.

## Key findings

- Automated volumetric assessment identified stable, increased, and decreased stone burdens across kidneys.
- Qualitative radiologist assessments showed weak diagnostic performance (AUC range, 0.55–0.62).
- Radiomics features achieved an AUC of 0.71 but did not outperform threshold-based volumetric assessment.

## Abstract

This retrospective diagnostic accuracy study compared radiologist-based qualitative assessments and radiomics-based analyses with an automated artificial intelligence (AI)–based volumetric approach for evaluating changes in kidney stone burden on follow-up CT examinations. With institutional review board approval, 157 patients (mean age, 61 ± 13 years; 99 men, 58 women) who underwent baseline and follow-up non-contrast abdomen–pelvis CT for kidney stone evaluation were included. The index test was an automated AI-based whole-kidney and stone segmentation radiomics prototype (Frontier, Siemens Healthineers), which segmented both kidneys and isolated stone volumes using a fixed threshold of 130 Hounsfield units, providing stone volume and maximum diameter per kidney. The reference standard was a threshold-defined volumetric assessment of stone burden change between baseline and follow-up CTs. The radiologist’s performance was assessed using (1) interpretations from clinical radiology reports and (2) an independent radiologist’s assessment of stone burden change (stable, increased, or decreased). Diagnostic accuracy was evaluated using multivariable logistic regression and receiver operating characteristic (ROC) analysis. Automated volumetric assessment identified stable (n = 44), increased (n = 109), and decreased (n = 108) stone burden across the evaluated kidneys. Qualitative assessments from radiology reports demonstrated weak diagnostic performance (AUC range, 0.55–0.62), similar to the independent radiologist (AUC range, 0.41–0.72) for differentiating changes in stone burden. A model incorporating higher-order radiomics features achieved an AUC of 0.71 for distinguishing increased versus decreased stone burdens compared with the baseline CT (p < 0.001), but did not outperform threshold-based volumetric assessment. The automated threshold-based volumetric quantification of kidney stone burdens provides higher diagnostic accuracy than qualitative radiologist assessments and radiomics-based analyses for identifying a stable, increased, or decreased stone burden on follow-up CT examinations.

## Linked entities

- **Diseases:** kidney stone (MONDO:0008171)

## Full-text entities

- **Diseases:** Stone Burden (MESH:D007669)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12842500/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12842500/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12842500/full.md

---
Source: https://tomesphere.com/paper/PMC12842500