Robustness Evaluation of a Foundation Segmentation Model Under Simulated Domain Shifts in Abdominal CT: Implications for Health Digital Twin Deployment
Sanghati Basu

TL;DR
This study systematically evaluates the robustness of the Segment Anything Model (SAM) for spleen segmentation in abdominal CT scans under simulated domain shifts, finding it remains stable with minor performance changes.
Contribution
It provides the first detailed robustness assessment of SAM in medical imaging under realistic CT domain shifts, highlighting its stability and reliability.
Findings
SAM achieved a mean Dice score of 0.9145 on clean data.
Performance remained stable with mean ΔDice below 0.01 under perturbations.
No significant increase in failure probability was observed under simulated domain shifts.
Abstract
Foundation segmentation models such as the Segment Anything Model (SAM) have demonstrated strong generalization across natural images; however, their robustness under clinically realistic medical imaging domain shifts remains insufficiently quantified. We present a systematic slice-level robustness audit of SAM (ViT-B) for spleen segmentation in abdominal CT using 1,051 nonempty slices from 41 volumes in the Medical Segmentation Decathlon. A standardized ground-truth-derived bounding-box protocol was used to isolate encoder robustness from prompt uncertainty. Controlled perturbations simulating inter-scanner variability, including Gaussian noise, blur, contrast scaling, gamma correction, and resolution mismatch, were applied across ten conditions. The clean baseline achieved a mean Dice score of 0.9145 (95% CI: [0.909, 0.919]) with a failure rate of 0.67%. Across all perturbations, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
