Coded Computation across Shared Heterogeneous Workers with Communication Delay
Yuxuan Sun, Fan Zhang, Junlin Zhao, Sheng Zhou, Zhisheng Niu, Deniz, G\"und\"uz

TL;DR
This paper develops optimized coded computation strategies for heterogeneous distributed systems to minimize total delay, accounting for communication and computation delays, with algorithms tailored for different worker assignment policies.
Contribution
It introduces novel algorithms for worker and resource allocation in multi-master heterogeneous systems, optimizing delay through approximation and convex methods.
Findings
Proposed algorithms outperform benchmarks in reducing task delay.
Dedicated and fractional assignment policies have distinct advantages.
Extensive simulations validate the effectiveness of the algorithms.
Abstract
Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and their assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to workers for parallel computation. The goal is to minimize the communication plus computation delay of the slowest task. We propose worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
