Distributed Convolutional Neural Network Training on Mobile and Edge Clusters
Pranav Rama, Madison Threadgill, Andreas Gerstlauer

TL;DR
This paper presents a novel method for distributed CNN training on mobile and edge devices, achieving significant speedup and memory reduction without accuracy loss by partitioning and grouping layers.
Contribution
It introduces a fully decentralized approach for CNN training on edge devices, avoiding reliance on central servers and optimizing for resource constraints.
Findings
2x-15x training speedup on Raspberry Pi clusters
Up to 8x memory reduction per device
No accuracy loss in object detection CNNs
Abstract
The training of deep and/or convolutional neural networks (DNNs/CNNs) is traditionally done on servers with powerful CPUs and GPUs. Recent efforts have emerged to localize machine learning tasks fully on the edge. This brings advantages in reduced latency and increased privacy, but necessitates working with resource-constrained devices. Approaches for inference and training in mobile and edge devices based on pruning, quantization or incremental and transfer learning require trading off accuracy. Several works have explored distributing inference operations on mobile and edge clusters instead. However, there is limited literature on distributed training on the edge. Existing approaches all require a central, potentially powerful edge or cloud server for coordination or offloading. In this paper, we describe an approach for distributed CNN training exclusively on mobile and edge devices.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition
