Full-Stack Optimization for CAM-Only DNN Inference

Jo\~ao Paulo C. de Lima; Asif Ali Khan; Luigi Carro; Jeronimo; Castrillon

arXiv:2401.12630·cs.AR·January 24, 2024·2 cites

Full-Stack Optimization for CAM-Only DNN Inference

Jo\~ao Paulo C. de Lima, Asif Ali Khan, Luigi Carro, Jeronimo, Castrillon

PDF

Open Access

TL;DR

This paper introduces a novel compilation method for ternary neural networks on racetrack memory associative processors, significantly enhancing energy efficiency for DNN inference while maintaining accuracy.

Contribution

It presents a new compilation flow that optimizes convolutions on APs with RTM, reducing data transfers and improving energy efficiency for DNN inference.

Findings

01

7.5x energy efficiency improvement for ResNet-18 on ImageNet

02

Retains software accuracy with RTM-based APs

03

Reduces data transfers in memory during inference

Abstract

The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. Additionally, for some CIM designs, the activation movement still requires considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Memory and Neural Computing