Efficient Deployment of CNN Models on Multiple In-Memory Computing Units
Eleni Bougioukou, Theodore Antonakopoulos

TL;DR
This paper presents a novel task allocation algorithm, LBLP, for deploying CNNs on multi-PU IMC hardware, significantly improving processing rate and latency.
Contribution
We introduce the LBLP algorithm for dynamic CNN deployment on multi-PU IMC systems, optimizing resource utilization and performance.
Findings
LBLP outperforms alternative scheduling strategies
Significant reduction in latency achieved
Higher processing rates demonstrated
Abstract
In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration by mitigating data movement bottlenecks and leveraging the inherent parallelism of memory-based computations. The efficient deployment of Convolutional Neural Networks (CNNs) on IMC-based hardware necessitates the use of advanced task allocation strategies for achieving maximum computational efficiency. In this work, we exploit an IMC Emulator (IMCE) with multiple Processing Units (PUs) for investigating how the deployment of a CNN model in a multi-processing system affects its performance, in terms of processing rate and latency. For that purpose, we introduce the Load-Balance-Longest-Path (LBLP) algorithm, that dynamically assigns all CNN nodes to the available IMCE PUs, for maximizing the processing rate and minimizing latency due to efficient resources utilization. We are benchmarking LBLP against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · IoT and Edge/Fog Computing
