Weight Group-wise Post-Training Quantization for Medical Foundation Model
Yineng Chen, Peng Huang, Aozhong Zhang, Hui Guo, Penghang Yin, Shu Hu, Shao Lin, Xin Li, Tzu-Jen Kao, Balakrishnan Prabhakaran, MingChing Chang, Xin Wang

TL;DR
This paper introduces Permutation-COMQ, a post-training quantization method for medical foundation models that simplifies the process and improves accuracy at low bit-widths.
Contribution
It proposes a hyperparameter-free, weight-aware quantization algorithm that enhances model compression for medical image analysis.
Findings
Achieves state-of-the-art results in 2-bit, 4-bit, and 8-bit quantization.
Eliminates the need for backpropagation and hyperparameter tuning.
Addresses accuracy loss by reordering weights within layers.
Abstract
Foundation models have achieved remarkable results in medical image analysis. However, its large network architecture and high computational complexity significantly impact inference speed, limiting its application on terminal medical devices. Quantization, a technique that compresses models into low-bit versions, is a solution to this challenge. In this paper, we propose a post-training quantization algorithm, Permutation-COMQ. It eliminates the need for backpropagation by using simple dot products and rounding operations, thereby removing hyperparameter tuning and simplifying the process. Additionally, we introduce a weight-aware strategy that reorders the weight within each layer to address the accuracy degradation induced by channel-wise scaling during quantization, while preserving channel structure. Experiments demonstrate that our method achieves the best results in 2-bit, 4-bit,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
