Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized   Language Model Finetuning Using Shared Randomness

Eric Zelikman; Qian Huang; Percy Liang; Nick Haber; Noah D. Goodman

arXiv:2306.10015·cs.LG·June 19, 2023·2 cites

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

Eric Zelikman, Qian Huang, Percy Liang, Nick Haber, Noah D. Goodman

PDF

Open Access 1 Repo

TL;DR

This paper introduces a low-bandwidth decentralized language model fine-tuning method using shared randomness, exchanging only single-byte gradients, which reduces communication costs and enhances privacy.

Contribution

It extends SPSA-based distributed fine-tuning with shared randomness, enabling highly communication-efficient and privacy-preserving model updates in decentralized settings.

Findings

01

Significantly reduces communication bandwidth for distributed training.

02

Supports dynamic addition/removal of machines during training.

03

Maintains memory efficiency and inference-only advantages.

Abstract

Language model training in distributed settings is limited by the communication cost of gradient exchanges. In this short note, we extend recent work from Malladi et al. (2023), using shared randomness to perform distributed fine-tuning with low bandwidth. The method is a natural decentralized extension of memory-efficient Simultaneous Perturbation Stochastic Approximation (SPSA). Each iteration, each machine seeds a Random Number Generator (RNG) to perform local reproducible perturbations on model weights and calculate and exchange scalar projected gradients, which are then used to update each model. By using a (machine, sample) identifier as the random seed, each model can regenerate one another's perturbations. As machines only exchange single-byte projected gradients, this is highly communication efficient. There are also potential privacy benefits, as projected gradients may be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ezelikman/justonebyte
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Privacy-Preserving Technologies in Data