MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization
Akshat Ramachandran, Souvik Kundu, Tushar Krishna

TL;DR
MicroScopiQ introduces a novel outlier-aware quantization method combined with pruning and specialized hardware to efficiently accelerate foundational models with high accuracy and reduced energy consumption.
Contribution
It proposes a new co-design technique that leverages pruning with outlier-aware quantization, enabling high-precision outlier handling without sacrificing hardware efficiency.
Findings
Achieves state-of-the-art quantization accuracy.
Up to 3x faster inference compared to existing methods.
Consumes 2x less energy during inference.
Abstract
Quantization of foundational models (FMs) is significantly more challenging than traditional DNNs due to the emergence of large magnitude values called outliers. Existing outlier-aware algorithm-architecture co-design techniques either use mixed-precision, retaining outliers at high precision but compromise hardware efficiency, or quantize inliers and outliers at the same precision, improving hardware efficiency at the cost of accuracy. To address this mutual exclusivity, we propose MicroScopiQ, a novel co-design technique that leverages pruning to complement outlier-aware quantization. MicroScopiQ retains outliers at higher precision while pruning a certain fraction of least important weights to distribute the additional outlier bits; ensuring high accuracy, aligned memory and hardware efficiency. We design a high-throughput, low overhead accelerator architecture composed of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Medical Image Segmentation Techniques · Radiomics and Machine Learning in Medical Imaging
MethodsPruning
