Exploring Federated Pruning for Large Language Models
Pengxin Guo, Yinong Wang, Wei Li, Mengting Liu, Ming Li, Jinkai Zheng, Liangqiong Qu

TL;DR
This paper introduces FedPrLLM, a federated pruning framework that enables privacy-preserving compression of large language models by allowing clients to collaboratively prune models without sharing raw data.
Contribution
The paper presents a novel federated pruning method for LLMs that preserves data privacy and identifies optimal pruning strategies through extensive experiments.
Findings
One-shot pruning with layer comparison is most effective.
No weight scaling yields the best results.
Federated pruning maintains model performance while protecting privacy.
Abstract
LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies typically require access to public calibration samples, which can be challenging to obtain in privacy-sensitive domains. To address this issue, we introduce FedPrLLM, a comprehensive federated pruning framework designed for the privacy-preserving compression of LLMs. In FedPrLLM, each client only needs to calculate a pruning mask matrix based on its local calibration data and share it with the server to prune the global model. This approach allows for collaborative pruning of the global model with the knowledge of each client while maintaining local data privacy. Additionally, we conduct extensive experiments to explore various possibilities within the FedPrLLM framework, including different comparison groups, pruning strategies,…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper addresses a timely problem: pruning LLMs under privacy constraints in federated settings. - Provides extensive empirical evaluation across multiple models, sparsity levels, and datasets. - Clear takeaways (layer comparison, no scaling, one-shot pruning) that practitioners can adopt. - Framework is simple and easy to implement, making it accessible for real-world experimentation.
- The approach is a straightforward mask aggregation; similar ideas exist (e.g., FedSpaLLM[1]). - Weak privacy claim. No secure aggregation or differential privacy. Masks may potentially leak sensitive information - No comparison to advanced pruning methods (OWL[2], BESA[3], SliceGPT[4]) or structured sparsity approaches. While the authors can argue that some of tghe methods are structured pruning methods, it is important to compare against them. - Communication costs lack units; no runtime, mem
1. **High Practical Relevance**: Addresses a critical gap—how to compress LLMs in privacy-sensitive, decentralized settings (e.g., healthcare, finance)—where public calibration data is unavailable. 2. **Rigorous and Systematic Evaluation**: The scale of experiments (6 LLMs, multiple sparsities, datasets, and methods) is exceptional for a systems/ML paper. The ablation studies are thorough and convincing. 3. **Clear, Counterintuitive Insights**: The findings—especially that weight scaling hurts p
1. **Limited Baseline Comparison**: While the paper compares against “Local-only” and “Centralized” baselines, it does not benchmark against concurrent or prior federated compression methods (e.g., FedSpaLLM [Bai et al., 2024], mentioned in Related Work). A direct comparison would strengthen impact claims. 2. **Assumption of Public Pre-training**: The framework assumes access to a public pre-trained LLM. While standard, the paper does not discuss implications if pre-training data were private—a
- The paper enables LLM pruning under federated settings with privacy preservation, which is interesting and underexplored. - The paper did large-scale experiments with 6 LLMs, 10 datasets, multiple sparsity levels. Ablation studies support key claims. - The three pruning dimensions are practical and grounded, covering critical decisions in collaborative model compression. Shows that simpler design choices (e.g., no scaling, one-shot pruning) often outperform complex alternatives, saving compu
- The core framework is a combination of known components; prior work (e.g., FedSpaLLM) has used mask voting for federated LLM pruning. - The paper lacks formal analysis of why certain choices (e.g., weight scaling degrades performance) work or fail. - All experiments assume IID or single-dataset settings; real-world FL often involves non-IID, skewed data, which may affect mask voting. Can authors clarify on this?
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsPruning
