Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms
Ming Xiang, Lili Su

TL;DR
This paper introduces two communication-efficient, resilient algorithms for federated learning that use one-bit gradient compression, adapt to unbounded gradients, and outperform existing methods in convergence and robustness.
Contribution
The paper proposes Ada-StoSign and $eta$-StoSign algorithms that enable efficient, resilient federated learning with one-bit compressors and adaptive gradient norm estimation.
Findings
Ada-StoSign converges at rate O(log T/√T + 1/√M)
Ada-StoSign outperforms state-of-the-art when M is large
β-StoSign provides Byzantine resilience and privacy guarantees
Abstract
Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em -Stochastic Sign SGD (-StoSign)}, each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Wireless Communication Security Techniques
MethodsStochastic Gradient Descent
