Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
Tolga Dimlioglu, Anna Choromanska

TL;DR
This paper introduces a communication-efficient distributed training method, DPPF, that encourages deep neural networks to find flatter minima, leading to better generalization and improved performance over existing approaches.
Contribution
The paper proposes the DPPF algorithm, incorporating a novel sharpness measure as a regularizer, with theoretical analysis and empirical validation demonstrating its effectiveness in distributed deep learning.
Findings
DPPF outperforms other communication-efficient methods in generalization.
DPPF effectively locates flatter minima in loss landscapes.
Theoretical analysis confirms convergence and stability of DPPF.
Abstract
We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
