
TL;DR
This paper introduces a hierarchical coded computation approach that leverages partial work from all nodes in distributed systems, significantly reducing expected computation time by optimizing layered erasure codes.
Contribution
It proposes a novel hierarchical coding scheme that exploits partial node work and designs layer-specific codes to improve distributed computation efficiency.
Findings
Achieves 1.5x reduction in expected finishing time.
Design guidelines for optimizing layered erasure codes.
Extends coded computation to utilize partial work by stragglers.
Abstract
Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of application was later extended to include matrix-matrix multiplication, heterogeneous networks, convolution, and approximate computing. A drawback to previous results is they completely ignore work completed by stragglers. While stragglers are slower compute nodes, in many settings the amount of work completed by stragglers can be non-negligible. Thus, in this work, we propose a hierarchical coded computation method that exploits the work completed by all compute nodes. We partition each node's computation into layers of sub-computations such that each layer can be treated as (distinct) erasure channel. We then design different erasure codes for each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
