Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric   Quantization and Energy-Saving Bit-Slice Sparsity

Dongyun Kam; Myeongji Yun; Sunwoo Yoo; Seungwoo Hong; Zhengya Zhang,; Youngjoo Lee

arXiv:2412.10059·cs.AR·December 16, 2024

Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity

Dongyun Kam, Myeongji Yun, Sunwoo Yoo, Seungwoo Hong, Zhengya Zhang,, Youngjoo Lee

PDF

TL;DR

This paper introduces Panacea, a DNN accelerator that combines accuracy-preserving asymmetric quantization with energy-efficient bit-slice sparsity, achieving high accuracy and hardware efficiency for large-scale DNN inferences.

Contribution

It proposes AQS-GEMM, a novel method that compresses and skips nonzero slices from asymmetric quantization, along with hardware optimizations in the Panacea accelerator.

Findings

01

Panacea outperforms existing DNN accelerators in benchmarks.

02

AQS-GEMM effectively compresses nonzero slices, reducing energy consumption.

03

Hardware optimizations improve data reuse and utilization.

Abstract

Low bit-precisions and their bit-slice sparsity have recently been studied to accelerate general matrix-multiplications (GEMM) during large-scale deep neural network (DNN) inferences. While the conventional symmetric quantization facilitates low-resolution processing with bit-slice sparsity for both weight and activation, its accuracy loss caused by the activation's asymmetric distributions cannot be acceptable, especially for large-scale DNNs. In efforts to mitigate this accuracy loss, recent studies have actively utilized asymmetric quantization for activations without requiring additional operations. However, the cutting-edge asymmetric quantization produces numerous nonzero slices that cannot be compressed and skipped by recent bit-slice GEMM accelerators, naturally consuming more processing energy to handle the quantized DNN models. To simultaneously achieve high accuracy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.