Beyond Scaffold: A Unified Spatio-Temporal Gradient Tracking Method
Yan Huang, Jinming Xu, Jiming Chen, and Karl Henrik Johansson

TL;DR
This paper introduces ST-GT, a unified spatio-temporal gradient tracking method for distributed optimization that reduces communication costs and handles data heterogeneity effectively, achieving linear convergence in certain settings.
Contribution
ST-GT unifies gradient tracking with a novel spatio-temporal approach, improving convergence and communication efficiency in federated learning over time-varying graphs.
Findings
Achieves linear convergence for strongly convex problems.
Attains the first linear speed-up in communication complexity with respect to local updates.
Reduces the topology-dependent noise term from σ² to σ²/τ.
Abstract
In distributed and federated learning algorithms, communication overhead is often reduced by performing multiple local updates between communication rounds. However, due to data heterogeneity across nodes and the local gradient noise within each node, this strategy can lead to the drift of local models away from the global optimum. To address this issue, we revisit the well-known federated learning method Scaffold (Karimireddy et al., 2020) under a gradient tracking perspective, and propose a unified spatio-temporal gradient tracking algorithm, termed ST-GT, for distributed stochastic optimization over time-varying graphs. ST-GT tracks the global gradient across neighboring nodes to mitigate data heterogeneity, while maintaining a running average of local gradients to substantially suppress noise, with slightly more storage overhead. Without assuming bounded data heterogeneity, we prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization
