Guaranteeing Both Consensus and Optimality in Decentralized Nonconvex Optimization with Multiple Local Updates
Jie Liu, Zuang Wang, and Yongqiang Wang

TL;DR
This paper introduces MILE, a decentralized algorithm that guarantees consensus and optimality in nonconvex optimization with multiple local updates, reducing communication costs and applicable to large-scale machine learning problems.
Contribution
MILE is the first decentralized method to ensure both consensus and optimality under multiple local updates in nonconvex settings, with a novel analysis framework and minimal communication overhead.
Findings
Achieves $O(1/T)$ convergence rate with stochastic gradients
Requires only one variable exchange per agent pair
Demonstrates effectiveness on benchmark datasets
Abstract
Scalable decentralized optimization in large-scale systems hinges on efficient communication. A common way to reduce communication overhead is to perform multiple local updates between two communication rounds, as in federated learning. However, extending this strategy to fully decentralized settings poses fundamental challenges. Existing decentralized algorithms with multiple local updates guarantee accurate convergence only under strong convexity, limiting applicability to the nonconvex problems prevalent in machine learning. Moreover, many methods require exchanging and storing auxiliary variables, such as gradient-tracking vectors or correction terms, to ensure convergence under data heterogeneity, incurring high communication and memory costs. In this paper, we propose MILE, a fully decentralized algorithm that guarantees both consensus and optimality under multiple local updates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
