StruM: Structured Mixed Precision for Efficient Deep Learning Hardware Codesign
Michael Wu, Arnab Raha, Deepak A. Mathaikutty, Martin Langhammer, Engin Tunali, and Daksha Sharma

TL;DR
StruM introduces a structured mixed-precision inference method co-designed with hardware, reducing computational demands and power consumption in deep learning accelerators without retraining.
Contribution
The paper presents a novel mixed-precision quantization approach that leverages weight variance, avoiding retraining, and is co-designed with a hardware accelerator for efficiency.
Findings
Up to 50% reduction in weight precision with negligible accuracy loss.
31-34% reduction in processing element power consumption.
23-26% area reduction at the PE level.
Abstract
In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep learning workloads in data centers and edge applications. Diverging from traditional approaches, our method avoids time-consuming re-training/fine-tuning and specialized hardware access. By leveraging the variance in weight magnitudes within layers, we quantize values within blocks to two different levels, achieving up to a 50% reduction in precision for 8-bit integer weights to 4-bit values across various Convolutional Neural Networks (CNNs) with negligible loss in inference accuracy. To demonstrate efficiency gains by utilizing mixed precision, we implement StruM on top of our in-house FlexNN DNN accelerator [1] that supports low and mixed-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Image and Object Detection Techniques
