TL;DR
This paper introduces new coded computation strategies, MatDot and PolyDot, that reduce the number of successful workers needed for distributed matrix multiplication, improving recovery thresholds over existing methods.
Contribution
The paper presents MatDot and PolyDot codes that improve recovery thresholds in distributed matrix multiplication, along with systematic construction and a technique for multiplying multiple matrices.
Findings
MatDot codes require only 2m-1 successful workers, outperforming Polynomial codes.
PolyDot codes offer a trade-off between communication cost and recovery threshold.
A new technique for multiplying multiple matrices using these codes is demonstrated.
Abstract
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When -th fraction of each matrix can be stored in each worker node, Polynomial codes require successful workers, while our MatDot codes only require successful workers, albeit at a higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Further, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes to trade off communication cost and recovery threshold. Finally, we demonstrate a coding technique for multiplying matrices () by applying MatDot and PolyDot coding ideas.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
