CTSCAN: Evaluation Leakage in Chest CT Segmentation and a Reproducible Patient-Disjoint Benchmark
Anton Ivchenko

TL;DR
CTSCAN reveals that traditional chest CT segmentation benchmarks overestimate performance due to patient data leakage, and provides a reproducible, patient-disjoint evaluation framework to address this issue.
Contribution
The paper introduces CTSCAN, a reproducible benchmark and research stack for patient-disjoint chest CT segmentation evaluation, highlighting the impact of data leakage.
Findings
Significant performance drop when switching from slice-mixed to patient-disjoint evaluation.
Traditional workflows cause near-complete case reuse across train, validation, and test sets.
CTSCAN offers a reproducible, controlled evaluation environment for chest CT segmentation.
Abstract
Reported chest CT segmentation performance can be strongly inflated when train and test partitions mix slices from the same study. We present CTSCAN, a reproducible multi-source chest CT benchmark and research stack designed to measure what survives under patient-disjoint evaluation. The current four-class artifact aggregates 89 cases from PleThora, MedSeg SIRM, and LongCIU, and we show that the original slice-PNG workflow induces near-complete case reuse across train, validation, and test. Using the playground environment, we run a multi-seed protocol sweep with the same FPN plus EfficientNet-B0 control configuration under slice-mixed and case-disjoint evaluation. Across 3 seeds and 12 epochs per seed, the slice-mixed protocol reaches 0.6665 foreground Dice and 0.5031 foreground IoU, whereas the case-disjoint protocol reaches 0.2066 Dice and 0.1181 IoU. Removing patient reuse therefore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
