State of Abdominal CT Datasets: A Critical Review of Bias, Clinical Relevance, and Real-world Applicability

Saeide Danaei; Zahra Dehghanian; Elahe Meftah; Nariman Naderi; Seyed Amir Ahmad Safavi-Naini; Faeze Khorasanizade; and Hamid R. Rabiee

arXiv:2508.13626·eess.IV·August 20, 2025

State of Abdominal CT Datasets: A Critical Review of Bias, Clinical Relevance, and Real-world Applicability

Saeide Danaei, Zahra Dehghanian, Elahe Meftah, Nariman Naderi, Seyed Amir Ahmad Safavi-Naini, Faeze Khorasanizade, and Hamid R. Rabiee

PDF

TL;DR

This review critically assesses publicly available abdominal CT datasets, highlighting issues of bias, redundancy, and geographic skew, and proposes strategies to improve dataset diversity and clinical relevance for AI applications.

Contribution

It provides a comprehensive analysis of existing datasets, identifies key biases and limitations, and suggests targeted strategies for enhancing dataset quality and diversity.

Findings

01

59.1% case reuse across datasets

02

75.3% datasets from North America and Europe

03

63% of datasets with domain shift bias

Abstract

This systematic review critically evaluates publicly available abdominal CT datasets and their suitability for artificial intelligence (AI) applications in clinical settings. We examined 46 publicly available abdominal CT datasets (50,256 studies). Across all 46 datasets, we found substantial redundancy (59.1\% case reuse) and a Western/geographic skew (75.3\% from North America and Europe). A bias assessment was performed on the 19 datasets with >=100 cases; within this subset, the most prevalent high-risk categories were domain shift (63\%) and selection bias (57\%), both of which may undermine model generalizability across diverse healthcare environments -- particularly in resource-limited settings. To address these challenges, we propose targeted strategies for dataset improvement, including multi-institutional collaboration, adoption of standardized protocols, and deliberate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.