Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding
Songze Li, Seyed Mohammadreza Mousavi Kalan, Qian Yu, Mahdi, Soltanolkotabi, A. Salman Avestimehr

TL;DR
This paper introduces Polynomially Coded Regression (PCR), a novel data encoding method that significantly mitigates stragglers and reduces communication in distributed least-squares regression training, achieving near-optimal recovery thresholds.
Contribution
PCR is a new encoding scheme that reduces the recovery threshold in distributed regression, scaling inversely with worker storage, and is proven near optimal compared to existing methods.
Findings
PCR reduces run-time by up to 4.29x with stragglers.
PCR's recovery threshold is within a factor of two of the theoretical minimum.
Experiments on Amazon EC2 validate PCR's efficiency over state-of-the-art schemes.
Abstract
We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed systems are significantly slowed down due to the presence of slow-running machines (stragglers) as well as various communication bottlenecks. We propose "polynomially coded regression" (PCR) that substantially reduces the effect of stragglers and lessens the communication burden in such systems. The key idea of PCR is to encode the partial data stored at each worker, such that the computations at the workers can be viewed as evaluating a polynomial at distinct points. This allows the master to compute the final gradient by interpolating this polynomial. PCR significantly reduces the recovery threshold, defined as the number of workers the master has to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Error Correcting Code Techniques · Sparse and Compressive Sensing Techniques
