On the Optimal Recovery Threshold of Coded Matrix Multiplication

Sanghamitra Dutta; Mohammad Fahim; Farzin Haddadpour; Haewon Jeong,; Viveck Cadambe; Pulkit Grover

arXiv:1801.10292·cs.IT·May 17, 2018

On the Optimal Recovery Threshold of Coded Matrix Multiplication

Sanghamitra Dutta, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong,, Viveck Cadambe, Pulkit Grover

PDF

3 Repos

TL;DR

This paper introduces new coded computation strategies, MatDot and PolyDot, that reduce the number of successful workers needed for distributed matrix multiplication, improving recovery thresholds over existing methods.

Contribution

The paper presents MatDot and PolyDot codes that improve recovery thresholds in distributed matrix multiplication, along with systematic construction and a technique for multiplying multiple matrices.

Findings

01

MatDot codes require only 2m-1 successful workers, outperforming Polynomial codes.

02

PolyDot codes offer a trade-off between communication cost and recovery threshold.

03

A new technique for multiplying multiple matrices using these codes is demonstrated.

Abstract

We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When $m$ -th fraction of each matrix can be stored in each worker node, Polynomial codes require $m^{2}$ successful workers, while our MatDot codes only require $2 m - 1$ successful workers, albeit at a higher communication cost from each worker to the fusion node. We also provide a systematic construction of MatDot codes. Further, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes to trade off communication cost and recovery threshold. Finally, we demonstrate a coding technique for multiplying $n$ matrices ( $n \geq 3$ ) by applying MatDot and PolyDot coding ideas.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.