CELLM: An Efficient Communication in Large Language Models Training for   Federated Learning

Raja Vavekanand; Kira Sam

arXiv:2407.20557·cs.LG·November 11, 2024

CELLM: An Efficient Communication in Large Language Models Training for Federated Learning

Raja Vavekanand, Kira Sam

PDF

Open Access

TL;DR

This paper introduces CELLM, a method combining low-rank adaptation and sparse communication to efficiently train large language models in federated learning, significantly reducing communication costs while maintaining high utility.

Contribution

CELLM is the first approach to effectively integrate LoRA and sparse updates for federated LLM training, addressing communication and computation bottlenecks.

Findings

01

Reduces communication costs by up to 10x compared to vanilla LoRA.

02

Achieves up to 5x reduction over complex sparse LoRA baselines.

03

Maintains or improves model utility with optimized sparsity and rank configurations.

Abstract

Federated Learning (FL) is a recent model training paradigm in which client devices collaboratively train a model without ever aggregating their data. Crucially, this scheme offers users potential privacy and security benefits by only ever communicating updates to the model weights to a central server as opposed to traditional machine learning (ML) training which directly communicates and aggregates data. However, FL training suffers from statistical heterogeneity as clients may have differing local data distributions. Large language models (LLMs) offer a potential solution to this issue of heterogeneity given that they have consistently been shown to be able to learn on vast amounts of noisy data. While LLMs are a promising development for resolving the consistent issue of non-I.I.D. Clients in federated settings exacerbate two other bottlenecks in FL: limited local computing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data