Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices
Jun-Liang Lin, Sheng-De Wang

TL;DR
This paper introduces a communication-efficient neural network approach for distributed inference on edge devices, leveraging model parallelism and neural architecture search to reduce data transmission without significant performance loss.
Contribution
It proposes a novel method combining model parallelism and NAS to optimize distributed neural network inference for edge devices, reducing communication costs.
Findings
Data transmission reduced by 86.6% compared to baseline.
Distributed inference accelerates large neural networks on edge clusters.
Method maintains high performance with lower communication overhead.
Abstract
The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the computing power of many devices to enable the machine learning models. In this paper, we proposed a novel method of exploiting model parallelism to separate a neural network for distributed inferences. To achieve a better balance between communication latency, computation latency, and performance, we adopt neural architecture search (NAS) to search for the best transmission policy and reduce the amount of communication. The best model we found decreases by 86.6% of the amount of data transmission compared to the baseline and does not impact performance much. Under proper specifications of devices and configurations of models, our experiments show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
