Distributed Non-Convex Optimization with One-Bit Compressors on   Heterogeneous Data: Efficient and Resilient Algorithms

Ming Xiang; Lili Su

arXiv:2210.00665·cs.LG·February 21, 2023

Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

Ming Xiang, Lili Su

PDF

Open Access

TL;DR

This paper introduces two communication-efficient, resilient algorithms for federated learning that use one-bit gradient compression, adapt to unbounded gradients, and outperform existing methods in convergence and robustness.

Contribution

The paper proposes Ada-StoSign and $eta$-StoSign algorithms that enable efficient, resilient federated learning with one-bit compressors and adaptive gradient norm estimation.

Findings

01

Ada-StoSign converges at rate O(log T/√T + 1/√M)

02

Ada-StoSign outperforms state-of-the-art when M is large

03

β-StoSign provides Byzantine resilience and privacy guarantees

Abstract

Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em $β$ -Stochastic Sign SGD ( $β$ -StoSign)}, each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the $ℓ_{\infty}$ of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Wireless Communication Security Techniques

MethodsStochastic Gradient Descent