Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines
Chongyu Qu, Ritchie Zhao, Ye Yu, Bin Liu, Tianyuan Yao, Junchao Zhu,, Bennett A. Landman, Yucheng Tang, and Yuankai Huo

TL;DR
This paper presents a practical post-training quantization framework that enables true 8-bit model deployment on GPUs for 3D medical image segmentation, significantly reducing model size and inference latency without performance loss.
Contribution
The study introduces a real 8-bit quantization method for 3D medical segmentation models using TensorRT, bridging the gap between fake and real quantization in practical GPU deployment.
Findings
Achieved true 8-bit quantization on multiple SOTA models
Reduced model size and inference latency significantly
Maintained model performance post-quantization
Abstract
Quantizing deep neural networks ,reducing the precision (bit-width) of their computations, can remarkably decrease memory usage and accelerate processing, making these models more suitable for large-scale medical imaging applications with limited computational resources. However, many existing methods studied "fake quantization", which simulates lower precision operations during inference, but does not actually reduce model size or improve real-world inference speed. Moreover, the potential of deploying real 3D low-bit quantization on modern GPUs is still unexplored. In this study, we introduce a real post-training quantization (PTQ) framework that successfully implements true 8-bit quantization on state-of-the-art (SOTA) 3D medical segmentation models, i.e., U-Net, SegResNet, SwinUNETR, nnU-Net, UNesT, TransUNet, ST-UNet,and VISTA3D. Our approach involves two main steps. First, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Radiomics and Machine Learning in Medical Imaging · Medical Imaging and Analysis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Concatenated Skip Connection · U-Net
