Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Tolga Dimlioglu; Anna Choromanska

arXiv:2507.20424·cs.LG·October 13, 2025

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Tolga Dimlioglu, Anna Choromanska

PDF

TL;DR

This paper introduces a communication-efficient distributed training method, DPPF, that encourages deep neural networks to find flatter minima, leading to better generalization and improved performance over existing approaches.

Contribution

The paper proposes the DPPF algorithm, incorporating a novel sharpness measure as a regularizer, with theoretical analysis and empirical validation demonstrating its effectiveness in distributed deep learning.

Findings

01

DPPF outperforms other communication-efficient methods in generalization.

02

DPPF effectively locates flatter minima in loss landscapes.

03

Theoretical analysis confirms convergence and stability of DPPF.

Abstract

We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.