Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Kunjal Panchal, Nisarg Parikh, Sunav Choudhary, Lijun Zhang, Yuriy, Brun, Hui Guan

TL;DR
Spry is a novel federated learning algorithm that splits model weights among clients, enabling memory-efficient fine-tuning of large language models with high accuracy and fast convergence on resource-constrained devices.
Contribution
The paper introduces Spry, a federated learning method that uses weight splitting and forward-mode auto-differentiation to reduce memory usage while maintaining accuracy.
Findings
Reduces memory footprint by 1.4-7.1x compared to backpropagation.
Achieves 5.2-13.5% higher accuracy than zero-order methods.
Decreases convergence time by 1.2-20.3x.
Abstract
Finetuning large language models (LLMs) in federated learning (FL) settings has become increasingly important as it allows resource-constrained devices to finetune a model using private data. However, finetuning LLMs using backpropagation requires excessive memory (especially from intermediate activations) for resource-constrained devices. While Forward-mode Auto-Differentiation (AD) can significantly reduce memory footprint from activations, we observe that directly applying it to LLM finetuning results in slow convergence and poor accuracy. In this paper, we introduce Spry, an FL algorithm that splits trainable weights of an LLM among participating clients, such that each client computes gradients using forward-mode AD that are closer estimations of the true gradients. Spry achieves a low memory footprint, high accuracy, and fast convergence. We formally prove that the global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Privacy-Preserving Technologies in Data · Data Quality and Management
