On the Trade-off between Flatness and Optimization in Distributed Learning

Ying Cao; Zhaoxian Wu; Kun Yuan; Ali H. Sayed

arXiv:2406.20006·cs.LG·July 3, 2025

On the Trade-off between Flatness and Optimization in Distributed Learning

Ying Cao, Zhaoxian Wu, Kun Yuan, Ali H. Sayed

PDF

Open Access

TL;DR

This paper develops a theoretical framework to analyze how decentralized learning strategies influence the trade-off between flatness of minima and optimization in nonconvex distributed learning, showing decentralized methods often outperform centralized ones in classification accuracy.

Contribution

It introduces a novel theoretical analysis of decentralized learning, revealing how strategies like diffusion better balance flatness and optimization, leading to improved accuracy.

Findings

01

Decentralized strategies escape local minima faster and favor flatter minima.

02

Diffusion outperforms consensus in excess-risk, aiding in escaping local minima.

03

Classification accuracy depends on both flatness and optimization performance.

Abstract

This paper proposes a theoretical framework to evaluate and compare the performance of stochastic gradient algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers three interesting results. First, it shows that decentralized learning strategies are able to escape faster away from local minima and favor convergence toward flatter minima relative to the centralized solution. Second, in decentralized methods, the consensus strategy has a worse excess-risk performance than diffusion, giving it a better chance of escaping from local minima and favoring flatter minima. Third, and importantly, the ultimate classification accuracy is not solely dependent on the flatness of the local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Energy Efficient Wireless Sensor Networks · Distributed Sensor Networks and Detection Algorithms

MethodsDiffusion