Low Precision Decentralized Distributed Training over IID and non-IID Data
Sai Aparna Aketi, Sangamesh Kodge, Kaushik Roy

TL;DR
This paper introduces a low precision decentralized training method that reduces computational and communication costs significantly while maintaining high accuracy, even with non-IID data, by proposing novel normalization and combining with compression techniques.
Contribution
It presents a novel low precision training approach for decentralized learning, including a new normalization layer and synergy with compression methods, addressing non-IID data challenges.
Findings
8-bit training maintains accuracy with minimal loss.
Combining low precision with sparsification causes 1-2% accuracy drop.
Training reduces complexity, memory, and energy by 4x and 20x respectively.
Abstract
Decentralized distributed learning is the key to enabling large-scale machine learning (training) on edge devices utilizing private user-generated local data, without relying on the cloud. However, the practical realization of such on-device training is limited by the communication and compute bottleneck. In this paper, we propose and show the convergence of low precision decentralized training that aims to reduce the computational complexity and communication cost of decentralized training. Many feedback-based compression techniques have been proposed in the literature to reduce communication costs. To the best of our knowledge, there is no work that applies and shows compute efficient training techniques such as quantization, pruning, etc., for peer-to-peer decentralized learning setups. Since real-world applications have a significant skew in the data distribution, we design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
