AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning
Daniel Coquelin, Katherina Fl\"ugel, Marie Weiel, Nicholas Kiefer,, Muhammed \"Oz, Charlotte Debus, Achim Streit, Markus G\"otz

TL;DR
AB-training is a communication-efficient distributed training method that uses low-rank representations and independent groups to significantly reduce network traffic, improve scalability, and enhance generalization in neural network training.
Contribution
This paper introduces AB-training, a novel low-rank, data-parallel approach that reduces communication overhead and improves training efficiency in distributed neural network training.
Findings
Reduced network traffic by approximately 70.31% across various scenarios
Achieved a 44.14:1 compression ratio on VGG16 with minimal accuracy loss
Outperformed traditional data parallel training by 1.55% on ResNet-50 with ImageNet
Abstract
Communication bottlenecks severely hinder the scalability of distributed neural network training, particularly in high-performance computing (HPC) environments. We introduce AB-training, a novel data-parallel method that leverages low-rank representations and independent training groups to significantly reduce communication overhead. Our experiments demonstrate an average reduction in network traffic of approximately 70.31\% across various scaling scenarios, increasing the training potential of communication-constrained systems and accelerating convergence at scale. AB-training also exhibits a pronounced regularization effect at smaller scales, leading to improved generalization while maintaining or even reducing training time. We achieve a remarkable 44.14 : 1 compression ratio on VGG16 trained on CIFAR-10 with minimal accuracy loss, and outperform traditional data parallel training by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Security in Wireless Sensor Networks · Machine Learning and Algorithms
