Mapping of CNNs on multi-core RRAM-based CIM architectures
Rebecca Pelke, Nils Bosbach, Jose Cubero, Felix Staudigl, Rainer, Leupers, Jan Moritz Joseph

TL;DR
This paper introduces synchronization techniques and architecture optimizations for CNN inference on RRAM-based CIM multi-core systems, achieving near-theoretical speedup with minimal data transmission overhead.
Contribution
It presents novel synchronization methods and compiler algorithms tailored for RRAM-based CIM architectures, enhancing CNN inference performance.
Findings
Achieved over 99% of the theoretical acceleration limit.
Reduced data transmission overhead to less than 4%.
Optimized architecture setup improves data exchange efficiency.
Abstract
RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables efficient data exchange and discuss the impact of different architecture setups on the performance. The corresponding compiler algorithms are optimized for high speedup and low memory consumption during CNN inference. We achieve more than 99% of the theoretical acceleration limit with a marginal data transmission overhead of less than 4% for state-of-the-art CNN benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
