Convert, compress, correct: Three steps toward communication-efficient   DNN training

Zhong-Jing Chen; Eduin E. Hernandez; Yu-Chih Huang; Stefano Rini

arXiv:2203.09044·cs.LG·March 18, 2022

Convert, compress, correct: Three steps toward communication-efficient DNN training

Zhong-Jing Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini

PDF

Open Access 1 Repo

TL;DR

This paper presents $ ext{CO}_3$, a three-step algorithm combining quantization, compression, and error correction to improve communication efficiency in distributed DNN training over constrained links.

Contribution

The paper introduces $ ext{CO}_3$, a novel joint protocol that effectively balances gradient quantization, compression, and error correction for efficient distributed training.

Findings

01

Demonstrates improved training efficiency over CIFAR-10

02

Balances three gradient processing steps for robustness

03

Achieves high performance with constrained communication links

Abstract

In this paper, we introduce a novel algorithm, $CO_{3}$ , for communication-efficiency distributed Deep Neural Network (DNN) training. $CO_{3}$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chen-zhong-jing/co3_algorithm
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Wireless Signal Modulation Classification · Adversarial Robustness in Machine Learning