FeedSign: Robust Full-parameter Federated Fine-tuning of Large Models   with Extremely Low Communication Overhead of One Bit

Zhijie Cai; Haolong Chen; Guangxu Zhu

arXiv:2501.17610·cs.DC·April 1, 2025

FeedSign: Robust Full-parameter Federated Fine-tuning of Large Models with Extremely Low Communication Overhead of One Bit

Zhijie Cai, Haolong Chen, Guangxu Zhu

PDF

Open Access

TL;DR

FeedSign is a federated fine-tuning method for large models that drastically reduces communication overhead to just 1 bit per exchange, while maintaining convergence speed and robustness.

Contribution

FeedSign introduces a novel 1-bit communication-efficient federated fine-tuning algorithm using zeroth-order optimization and shared pseudo-random generators.

Findings

01

Achieves exponential convergence rate of e^{-t} under standard assumptions.

02

Performs better or comparable to existing methods across models from 11M to 13B parameters.

03

Demonstrates robustness against data heterogeneity and Byzantine attacks.

Abstract

Federated fine-tuning (FFT) attempts to fine-tune a pre-trained model with private data from distributed clients by exchanging models rather than data under the orchestration of a parameter server (PS). To overcome the bottleneck forged by the growing communication and memory overhead for clients in such systems due to the growing model sizes, we propose \textit{FeedSign}, an FFT algorithm in which the upload and download payload for an aggregation step is exactly $1$ bit per step, while the memory overhead is squeezed to the amount needed for inference. This is realized by utilizing zeroth-order (ZO) optimizers on large models and shared pseudo-random number generators (PRNG) across devices to represent the gradient estimates as seed-sign pairs. We conduct theoretical analysis on FeedSign and show that it converges at an exponential rate $O (e^{- t})$ , where $t$ is the number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques · Neural Networks and Applications