APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference
Alberto Delmas Lascorz (1), Mostafa Mahmoud (1), Andreas Moshovos (1, and 2) ((1) University of Toronto (2) Vector Institute)

TL;DR
APack is a lossless off-chip memory compression technique that exploits value distribution in deep learning models to reduce data size, improve energy efficiency, and accelerate inference without altering on-chip data streams.
Contribution
APack introduces a novel, hardware-friendly compression method using arithmetic coding and value grouping, enhancing deep learning inference performance and energy efficiency.
Findings
Reduces weight and activation data footprints to 60% and 48%.
Achieves 1.44X speedup and 1.37X energy efficiency improvement.
Compatible with any machine learning accelerator.
Abstract
Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We present APack, a simple and effective, lossless, off-chip memory compression technique for fixed-point quantized models. APack reduces data widths by exploiting the non-uniform value distribution in deep learning applications. APack can be used to increase the effective memory capacity, to reduce off-chip traffic, and/or to achieve the desired performance/energy targets while using smaller off-chip memories. APack builds upon arithmetic coding, encoding each value as an arithmetically coded variable length prefix, plus an offset. To maximize compression ratio a heuristic software algorithm partitions the value space into groups each sharing a common prefix. APack exploits memory access parallelism by using several, pipelined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Data Compression Techniques · Advanced Neural Network Applications
