Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference
Sudarshan Sreeram, Bernhard Kainz

TL;DR
This paper demonstrates how pruning can significantly compress medical imaging models, enabling efficient on-device inference with minimal quality loss, thus facilitating practical healthcare AI deployment.
Contribution
It introduces a pruning-based approach to optimize existing medical imaging models for on-device use, highlighting the importance of task-specific considerations.
Findings
1148x model compression with ~4% quality loss
Faster CPU inference at high compression rates than GPU baseline
Emphasizes importance of task complexity and architecture in model deployment
Abstract
Leveraging ML advancements to augment healthcare systems can improve patient outcomes. Yet, uninformed engineering decisions in early-stage research inadvertently hinder the feasibility of such solutions for high-throughput, on-device inference, particularly in settings involving legacy hardware and multi-modal gigapixel images. Through a preliminary case study concerning segmentation in cardiology, we highlight the excess operational complexity in a suboptimally configured ML model from prior work and demonstrate that it can be sculpted away using pruning to meet deployment criteria. Our results show a compression rate of 1148x with minimal loss in quality (~4%) and, at higher rates, achieve faster inference on a CPU than the GPU baseline, stressing the need to consider task complexity and architectural details when using off-the-shelf models. With this, we consider avenues for future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Radiomics and Machine Learning in Medical Imaging · Medical Imaging Techniques and Applications
MethodsPruning
