End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We   There Yet?

Gianmarco Ottavi; Geethan Karunaratne; Francesco Conti; Irem Boybat,; Luca Benini; Davide Rossi

arXiv:2109.01404·cs.AR·September 6, 2021

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

Gianmarco Ottavi, Geethan Karunaratne, Francesco Conti, Irem Boybat,, Luca Benini, Davide Rossi

PDF

TL;DR

This paper evaluates the integration of analog in-memory computing with heterogeneous RISC-V cores for DNN inference, demonstrating a hybrid approach that balances speed, area, and efficiency in MobileNetV2 layers.

Contribution

It introduces a hybrid architecture combining IMA and RISC-V cores, analyzing performance trade-offs and proposing a strategy for efficient DNN inference.

Findings

01

Pointwise layers achieve significant speed-ups with IMA.

02

Depthwise layers face mapping challenges affecting throughput.

03

Hybrid execution improves speed and reduces area compared to all-in IMA.

Abstract

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.