OCCAM: Optimal Data Reuse for Convolutional Neural Networks
Ashish Gondimalla, Jianqiao Liu, T.N. Vijaykumar, Mithuna Thottethodi

TL;DR
Occam introduces an optimal partitioning and pipelining approach for CNNs that significantly reduces off-chip data transfers, improving performance and energy efficiency in image recognition tasks.
Contribution
It presents a novel method for full data reuse in CNNs through optimal partitioning, dependence closure analysis, and asynchronous pipelining, enhancing efficiency.
Findings
Reduces off-chip data transfer by 21x.
Achieves over 2x performance improvement.
Improves energy efficiency by up to 33%.
Abstract
Convolutional neural networks (CNNs) are emerging as powerful tools for image processing in important commercial applications. We focus on the important problem of improving the latency of image recognition. CNNs' large data at each layer's input, filters, and output poses a memory bandwidth problem. While previous work captures only some of the enormous data reuse, full reuse implies that the initial input image and filters are read once from off chip and the final output is written once off chip without spilling the intermediate layers' data to off-chip. We propose Occam to capture full reuse via four contributions. (1) We identify the necessary condition for full reuse. (2) We identify the dependence closure as the sufficient condition to capture full reuse using the least on-chip memory. (3) Because the dependence closure is often too large to fit in on-chip memory, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
