Full-Stack Optimization for CAM-Only DNN Inference
Jo\~ao Paulo C. de Lima, Asif Ali Khan, Luigi Carro, Jeronimo, Castrillon

TL;DR
This paper introduces a novel compilation method for ternary neural networks on racetrack memory associative processors, significantly enhancing energy efficiency for DNN inference while maintaining accuracy.
Contribution
It presents a new compilation flow that optimizes convolutions on APs with RTM, reducing data transfers and improving energy efficiency for DNN inference.
Findings
7.5x energy efficiency improvement for ResNet-18 on ImageNet
Retains software accuracy with RTM-based APs
Reduces data transfers in memory during inference
Abstract
The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. Additionally, for some CIM designs, the activation movement still requires considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Memory and Neural Computing
