Memory-Efficient Federated Fine-Tuning of Large Language Models via Layer Pruning
Yebo Wu, Jingguang Li, Chunlin Tian, Zhijiang Guo, Li Li

TL;DR
FedPruner introduces a memory-efficient federated fine-tuning method for large language models by intelligently pruning layers, enabling resource-constrained devices to participate effectively while maintaining high accuracy.
Contribution
The paper presents FedPruner, a novel layer pruning framework for federated LLM fine-tuning that creates personalized submodels based on device memory constraints.
Findings
Achieves up to 1.98% accuracy improvement over state-of-the-art.
Reduces peak memory usage by 75%.
Enables resource-constrained devices to participate in federated fine-tuning.
Abstract
Federated fine-tuning enables privacy-preserving Large Language Model (LLM) adaptation, but its high memory cost limits participation from resource-constrained devices. We propose FedPruner, an innovative federated fine-tuning paradigm that tackles this via intelligent layer pruning. FedPruner flexibly prunes the global model, creating personalized submodels based on device memory constraints. It employs a macro-micro synergistic pruning framework: a macro-level functionality-driven layer orchestration mechanism groups layers, while a micro-level importance-aware layer selection strategy prunes within groups to build device-specific submodels. We further introduce a fine-grained variant that independently prunes Multi-Head Attention and Feed-Forward Network components to precisely preserve critical architectural elements. Extensive experimental results demonstrate that FedPruner…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
