A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation
Ziyan Huang, Zhongying Deng, Jin Ye, Haoyu Wang, Yanzhou, Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun, He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao

TL;DR
This paper introduces A-Eval, a comprehensive benchmark for assessing the cross-dataset generalization of abdominal multi-organ segmentation models, analyzing various training strategies and model sizes across multiple large-scale datasets.
Contribution
It presents the A-Eval benchmark, enabling systematic evaluation of model generalization across diverse datasets and training scenarios in abdominal multi-organ segmentation.
Findings
Models trained on large datasets show improved generalization.
Data usage strategies significantly impact model performance.
Larger models tend to generalize better across datasets.
Abstract
Although deep learning have revolutionized abdominal multi-organ segmentation, models often struggle with generalization due to training on small, specific datasets. With the recent emergence of large-scale datasets, some important questions arise: \textbf{Can models trained on these datasets generalize well on different ones? If yes/no, how to further improve their generalizability?} To address these questions, we introduce A-Eval, a benchmark for the cross-dataset Evaluation ('Eval') of Abdominal ('A') multi-organ segmentation. We employ training sets from four large-scale public datasets: FLARE22, AMOS, WORD, and TotalSegmentator, each providing extensive labels for abdominal multi-organ segmentation. For evaluation, we incorporate the validation sets from these datasets along with the training set from the BTCV dataset, forming a robust benchmark comprising five distinct datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Artificial Intelligence in Healthcare and Education · Autopsy Techniques and Outcomes
MethodsFocus
