Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression
Zhuang Shao, Xiaoliang Chen, Li Du, Lei Chen, Yuan Du, Wei Zhuang,, Huadong Wei, Chenjia Xie, and Zhongfeng Wang

TL;DR
This paper introduces a hardware accelerator for CNNs that uses frequency domain transformation and sparse compression to significantly reduce memory requirements and bandwidth, enabling efficient real-time processing on embedded devices.
Contribution
It presents a novel interlayer feature map compression technique integrated into a CNN accelerator, combining DCT, quantization, and sparse matrix compression for the first time.
Findings
Achieves 403GOPS peak throughput on FPGA
Reduces interlayer feature map size by up to 3.3x
Maintains minimal delay with integrated compression and processing
Abstract
Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps. In this paper, we propose an efficient hardware accelerator with an interlayer feature compression technique to significantly reduce the required on-chip memory size and off-chip memory access bandwidth. The accelerator compresses interlayer feature maps through transforming the stored data into frequency domain using hardware-implemented 8x8 discrete cosine transform (DCT). The high-frequency components are removed after the DCT through quantization. Sparse matrix compression is utilized to further compress the interlayer feature maps. The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Neural Networks and Applications
MethodsDiscrete Cosine Transform
