Fundamental Limits of Communication Efficiency for Model Aggregation in   Distributed Learning: A Rate-Distortion Approach

Naifu Zhang; Meixia Tao; Jia Wang; Fan Xu

arXiv:2206.13984·cs.IT·June 29, 2022

Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach

Naifu Zhang, Meixia Tao, Jia Wang, Fan Xu

PDF

Open Access

TL;DR

This paper investigates the fundamental limits of communication efficiency in distributed learning model aggregation using an information-theoretic rate-distortion framework, revealing minimal communication costs and the benefits of exploiting gradient correlations.

Contribution

It formulates the model aggregation as a vector Gaussian CEO problem and derives the rate region and sum-rate-distortion function, providing theoretical bounds on communication costs.

Findings

01

Exploiting correlation between worker nodes significantly reduces communication.

02

High gradient distortion can lead to lower total communication costs.

03

The derived bounds inform optimal gradient compression strategies.

Abstract

One of the main focuses in distributed learning is communication efficiency, since model aggregation at each round of training can consist of millions to billions of parameters. Several model compression methods, such as gradient quantization and sparsification, have been proposed to improve the communication efficiency of model aggregation. However, the information-theoretic minimum communication cost for a given distortion of gradient estimators is still unknown. In this paper, we study the fundamental limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective. By formulating the model aggregation as a vector Gaussian CEO problem, we derive the rate region bound and sum-rate-distortion function for the model aggregation problem, which reveals the minimum communication rate at a particular gradient distortion upper bound. We also analyze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms · Sparse and Compressive Sensing Techniques