Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication
Asit Kumar Pradhan, Anoosheh Heidarzadeh, Krishna R. Narayanan

TL;DR
This paper introduces two novel coding schemes, factored LT and Raptor codes, for distributed matrix multiplication, improving recovery thresholds and stability in large-scale systems with stragglers.
Contribution
The paper adapts LT and Raptor codes into factored versions for distributed matrix multiplication, achieving better recovery thresholds and low-complexity decoding.
Findings
FLT codes have near-optimal recovery thresholds with many worker nodes.
FR codes perform well with a moderate number of worker nodes.
Both codes outperform Product codes and are more stable than Polynomial codes.
Abstract
We propose two coding schemes for distributed matrix multiplication in the presence of stragglers. These coding schemes are adaptations of LT codes and Raptor codes to distributed matrix multiplication and are termed \emph{factored LT (FLT) codes} and \emph{factored Raptor (FR) codes}. Empirically, we show that FLT codes have near-optimal recovery thresholds when the number of worker nodes is very large, and that FR codes have excellent recovery thresholds while the number of worker nodes is moderately large. FLT and FR codes have better recovery thresholds when compared to Product codes and they are expected to have better numerical stability when compared to Polynomial codes, while they can also be decoded with a low-complexity decoding algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
