Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, Mingyi Hong

TL;DR
This paper investigates the limitations of median- and mean-based algorithms like signSGD and medianSGD in heterogeneous data settings, proposing a noise-based correction method to ensure convergence and robustness in federated learning scenarios.
Contribution
It introduces a novel gradient correction mechanism that bridges the gap between median and mean gradients, enabling convergence under data heterogeneity.
Findings
Algorithms are non-convergent with data heterogeneity without correction.
The proposed noise perturbation method effectively aligns median and mean gradients.
The corrected algorithms maintain low communication costs and achieve convergence to stationary points.
Abstract
Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsAffine Coupling · Normalizing Flows
