Improving Memory Utilization in Convolutional Neural Network Accelerators
Petar Jokic, Stephane Emery, Luca Benini

TL;DR
This paper introduces a novel memory mapping technique for CNN accelerators that overlaps activation memory regions, significantly reducing memory usage and enabling larger networks to run efficiently on limited hardware.
Contribution
It proposes a mathematical model for maximizing activation memory overlap, improving memory utilization beyond traditional methods, and validates the approach with real-world network experiments and FPGA implementation.
Findings
Memory reduction of up to 32.9% for activations.
Overall network memory savings of up to 23.9%.
Activation memory savings of 48.8% for high-resolution networks.
Abstract
While the accuracy of convolutional neural networks has achieved vast improvements by introducing larger and deeper network architectures, also the memory footprint for storing their parameters and activations has increased. This trend especially challenges power- and resource-limited accelerator designs, which are often restricted to store all network data in on-chip memory to avoid interfacing energy-hungry external memories. Maximizing the network size that fits on a given accelerator thus requires to maximize its memory utilization. While the traditionally used ping-pong buffering technique is mapping subsequent activation layers to disjunctive memory regions, we propose a mapping method that allows these regions to overlap and thus utilize the memory more efficiently. This work presents the mathematical model to compute the maximum activations memory overlap and thus the lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
