Exploring Federated Pruning for Large Language Models

Pengxin Guo; Yinong Wang; Wei Li; Mengting Liu; Ming Li; Jinkai Zheng; Liangqiong Qu

arXiv:2505.13547·cs.LG·May 21, 2025

Exploring Federated Pruning for Large Language Models

Pengxin Guo, Yinong Wang, Wei Li, Mengting Liu, Ming Li, Jinkai Zheng, Liangqiong Qu

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces FedPrLLM, a federated pruning framework that enables privacy-preserving compression of large language models by allowing clients to collaboratively prune models without sharing raw data.

Contribution

The paper presents a novel federated pruning method for LLMs that preserves data privacy and identifies optimal pruning strategies through extensive experiments.

Findings

01

One-shot pruning with layer comparison is most effective.

02

No weight scaling yields the best results.

03

Federated pruning maintains model performance while protecting privacy.

Abstract

LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies typically require access to public calibration samples, which can be challenging to obtain in privacy-sensitive domains. To address this issue, we introduce FedPrLLM, a comprehensive federated pruning framework designed for the privacy-preserving compression of LLMs. In FedPrLLM, each client only needs to calculate a pruning mask matrix based on its local calibration data and share it with the server to prune the global model. This approach allows for collaborative pruning of the global model with the knowledge of each client while maintaining local data privacy. Additionally, we conduct extensive experiments to explore various possibilities within the FedPrLLM framework, including different comparison groups, pruning strategies,…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- The paper addresses a timely problem: pruning LLMs under privacy constraints in federated settings. - Provides extensive empirical evaluation across multiple models, sparsity levels, and datasets. - Clear takeaways (layer comparison, no scaling, one-shot pruning) that practitioners can adopt. - Framework is simple and easy to implement, making it accessible for real-world experimentation.

Weaknesses

- The approach is a straightforward mask aggregation; similar ideas exist (e.g., FedSpaLLM[1]). - Weak privacy claim. No secure aggregation or differential privacy. Masks may potentially leak sensitive information - No comparison to advanced pruning methods (OWL[2], BESA[3], SliceGPT[4]) or structured sparsity approaches. While the authors can argue that some of tghe methods are structured pruning methods, it is important to compare against them. - Communication costs lack units; no runtime, mem

Reviewer 02Rating 4Confidence 3

Strengths

1. **High Practical Relevance**: Addresses a critical gap—how to compress LLMs in privacy-sensitive, decentralized settings (e.g., healthcare, finance)—where public calibration data is unavailable. 2. **Rigorous and Systematic Evaluation**: The scale of experiments (6 LLMs, multiple sparsities, datasets, and methods) is exceptional for a systems/ML paper. The ablation studies are thorough and convincing. 3. **Clear, Counterintuitive Insights**: The findings—especially that weight scaling hurts p

Weaknesses

1. **Limited Baseline Comparison**: While the paper compares against “Local-only” and “Centralized” baselines, it does not benchmark against concurrent or prior federated compression methods (e.g., FedSpaLLM [Bai et al., 2024], mentioned in Related Work). A direct comparison would strengthen impact claims. 2. **Assumption of Public Pre-training**: The framework assumes access to a public pre-trained LLM. While standard, the paper does not discuss implications if pre-training data were private—a

Reviewer 03Rating 6Confidence 3

Strengths

- The paper enables LLM pruning under federated settings with privacy preservation, which is interesting and underexplored. - The paper did large-scale experiments with 6 LLMs, 10 datasets, multiple sparsity levels. Ablation studies support key claims. - The three pruning dimensions are practical and grounded, covering critical decisions in collaborative model compression. Shows that simpler design choices (e.g., no scaling, one-shot pruning) often outperform complex alternatives, saving compu

Weaknesses

- The core framework is a combination of known components; prior work (e.g., FedSpaLLM) has used mask voting for federated LLM pruning. - The paper lacks formal analysis of why certain choices (e.g., weight scaling degrades performance) work or fail. - All experiments assume IID or single-dataset settings; real-world FL often involves non-IID, skewed data, which may affect mask voting. Can authors clarify on this?

Code & Models

Repositories

pengxin-guo/fedprllm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsPruning