Near Optimal Coded Data Shuffling for Distributed Learning
Mohamed A. Attia, Ravi Tandon

TL;DR
This paper investigates the fundamental trade-off between storage capacity and communication overhead in distributed data shuffling for large-scale learning, proposing a near-optimal coded scheme that approaches theoretical limits.
Contribution
It introduces an information-theoretic framework and a novel coded shuffling scheme that nearly achieves the optimal storage-communication trade-off in distributed learning.
Findings
The proposed scheme is within a factor of rac{K}{K-1} of the lower bound.
Achieves optimal trade-off for K<5 with aligned coded shuffling.
Reduces the multiplicative gap to rac{K-rac{1}{3}}{K-1} for K.
Abstract
Data shuffling between distributed cluster of nodes is one of the critical steps in implementing large-scale learning algorithms. Randomly shuffling the data-set among a cluster of workers allows different nodes to obtain fresh data assignments at each learning epoch. This process has been shown to provide improvements in the learning process. However, the statistical benefits of distributed data shuffling come at the cost of extra communication overhead from the master node to worker nodes, and can act as one of the major bottlenecks in the overall time for computation. There has been significant recent interest in devising approaches to minimize this communication overhead. One approach is to provision for extra storage at the computing nodes. The other emerging approach is to leverage coded communication to minimize the overall communication overhead. The focus of this work is to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
