Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices
Beibei Zhang, Hongwei Zhu, Feng Gao, Zhihui Yang, Sean Xiaoyang Wang

TL;DR
Moirai introduces an optimized device placement method for distributed DNN inference that leverages runtime inter-operator fusion and considers device heterogeneity, significantly reducing inference latency.
Contribution
The paper proposes Moirai, a novel placement algorithm that exploits inter-operator fusion and handles device heterogeneity, outperforming existing methods in distributed DNN inference.
Findings
Moirai reduces end-to-end inference latency by up to 4.28×.
It outperforms state-of-the-art placement methods like Placeto, m-SCT, and GETF.
Extensive experiments on 11 large DNNs validate its effectiveness.
Abstract
The escalating size of Deep Neural Networks (DNNs) has spurred a growing research interest in hosting and serving DNN models across multiple devices. A number of studies have been reported to partition a DNN model across devices, providing device placement solutions. The methods appeared in the literature, however, either suffer from poor placement performance due to the exponential search space or miss an optimal placement as a consequence of the reduced search space with limited heuristics. Moreover, these methods have ignored the runtime inter-operator optimization of a computation graph when coarsening the graph, which degrades the end-to-end inference performance. This paper presents Moirai that better exploits runtime inter-operator fusion in a model to render a coarsened computation graph, reducing the search space while maintaining the inter-operator optimization provided by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · IoT and Edge/Fog Computing · Stochastic Gradient Optimization Techniques
