Information Theoretic Limits of Data Shuffling for Distributed Learning

Mohamed Attia; Ravi Tandon

arXiv:1609.05181·cs.IT·September 19, 2016

Information Theoretic Limits of Data Shuffling for Distributed Learning

Mohamed Attia, Ravi Tandon

PDF

TL;DR

This paper explores the fundamental limits of data shuffling in distributed learning, revealing how storage capacity impacts communication costs and proposing strategies to optimize data delivery and storage updates.

Contribution

It fully characterizes the information-theoretic trade-off for 2 and 3 workers, introducing a novel data delivery and storage update strategy to minimize communication.

Findings

01

Increasing storage reduces communication overhead via coding.

02

Complete characterization of the trade-off for 2 and 3 workers.

03

Proposed systematic data delivery and storage update strategy.

Abstract

Data shuffling is one of the fundamental building blocks for distributed learning algorithms, that increases the statistical gain for each step of the learning process. In each iteration, different shuffled data points are assigned by a central node to a distributed set of workers to perform local computations, which leads to communication bottlenecks. The focus of this paper is on formalizing and understanding the fundamental information-theoretic trade-off between storage (per worker) and the worst-case communication overhead for the data shuffling problem. We completely characterize the information theoretic trade-off for $K = 2$ , and $K = 3$ workers, for any value of storage capacity, and show that increasing the storage across workers can reduce the communication overhead by leveraging coding. We propose a novel and systematic data delivery and storage update strategy for each data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.