APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning   Inference

Alberto Delmas Lascorz (1); Mostafa Mahmoud (1); Andreas Moshovos (1; and 2) ((1) University of Toronto (2) Vector Institute)

arXiv:2201.08830·cs.AR·January 24, 2022·1 cites

APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference

Alberto Delmas Lascorz (1), Mostafa Mahmoud (1), Andreas Moshovos (1, and 2) ((1) University of Toronto (2) Vector Institute)

PDF

Open Access

TL;DR

APack is a lossless off-chip memory compression technique that exploits value distribution in deep learning models to reduce data size, improve energy efficiency, and accelerate inference without altering on-chip data streams.

Contribution

APack introduces a novel, hardware-friendly compression method using arithmetic coding and value grouping, enhancing deep learning inference performance and energy efficiency.

Findings

01

Reduces weight and activation data footprints to 60% and 48%.

02

Achieves 1.44X speedup and 1.37X energy efficiency improvement.

03

Compatible with any machine learning accelerator.

Abstract

Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present APack, a simple and effective, lossless, off-chip memory compression technique for fixed-point quantized models. APack reduces data widths by exploiting the non-uniform value distribution in deep learning applications. APack can be used to increase the effective memory capacity, to reduce off-chip traffic, and/or to achieve the desired performance/energy targets while using smaller off-chip memories. APack builds upon arithmetic coding, encoding each value as an arithmetically coded variable length prefix, plus an offset. To maximize compression ratio a heuristic software algorithm partitions the value space into groups each sharing a common prefix. APack exploits memory access parallelism by using several, pipelined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Data Compression Techniques · Advanced Neural Network Applications