Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
Xuan Ding, Pengyu Tong, Ranjie Duan, Yunjian Zhang, Rui Sun, Yao Zhu

TL;DR
This paper introduces a game-theoretic, surrogate-assisted method for layer-wise pruning of large language models, effectively capturing inter-layer dependencies to improve efficiency and performance.
Contribution
It presents a novel cooperative game framework with surrogate models and stratified sampling for better layer importance estimation in LLM pruning.
Findings
Outperforms existing pruning methods in perplexity and zero-shot accuracy
Reduces computational cost of layer importance estimation
Effectively captures inter-layer dependencies during pruning
Abstract
While large language models (LLMs) demonstrate impressive performance across various tasks, their deployment in real-world scenarios is still constrained by high computational demands. Layer-wise pruning, a commonly employed strategy to mitigate inference costs, can partially address this challenge. However, existing approaches generally depend on static heuristic rules and fail to account for the interdependencies among layers, thereby limiting the effectiveness of the pruning process. To this end, this paper proposes a game-theoretic framework that formulates layer pruning as a cooperative game in which each layer acts as a player and model performance serves as the utility. As computing exact Shapley values is computationally infeasible for large language models (LLMs), we propose using a lightweight surrogate network to estimate layer-wise marginal contributions. This network can…
Peer Reviews
Decision·ICLR 2026 Poster
1. Viewing layer pruning as a cooperative game is original and well-motivated. It captures inter-layer dependencies often ignored in prior pruning methods based on static heuristics. The surrogate-assisted estimation is elegant and computationally practical, bridging theory and application. 2. Experiments are thorough, covering multiple models (transformer and non-transformer), datasets, and both generative and reasoning tasks, and generalization to quantization. 3. Consistently outperforms str
1. While inspired by cooperative game theory, the connection remains mostly heuristic. The surrogate model approximates marginal contributions but lacks analysis of approximation error or variance bounds. 2. There is limited discussion of the surrogate’s accuracy or potential biases (e.g., overfitting to sampled masks). Reporting R² or correlation between predicted and true perplexities would strengthen the claim. 3. Although the method reduces evaluation costs compared to naive Shapley computat
- **Principled Formulation & Theoretical Motivation**: The paper introduces a compelling game-theoretic framing for layer pruning, challenging the prevalent assumption of independent layer importance and instead recognizing context-dependent inter-layer dynamics. This principled approach addresses a core limitation of widely-used heuristics. - **Efficient Approximation of Shapley Values**: By incorporating a lightweight surrogate network trained on stratified Monte Carlo mask samples, the method
- **Surrogate Network Limitations and Validation**: While Figure 6 and Table 7 specify the surrogate’s structure, the paper lacks a rigorous quantitative evaluation of its prediction fidelity, especially for masks far from the training distribution. There is little discussion of failure modes, e.g., overfitting to calibration samples, brittleness under extreme masking, or calibration data misspecification (see Appendix F.1), limiting confidence for highly compressed regimes or out-of-domain sett
1. This paper reformulates LLM pruning as a cooperative game, capturing inter-layer dependencies ignored by static heuristics. 2. This paper proposes a scalable Shapley-based pruning framework using stratified sampling and a surrogate model for efficient layer contribution estimation. 3. This paper demonstrates consistent improvements over depth- and width-wise pruning baselines on multiple benchmarks, including WikiText2, PTB, C4, and zero-shot reasoning tasks.
1. While the paper proposes a surrogate-assisted approach to estimate Shapley values efficiently, it lacks a clear theoretical analysis quantifying how well the surrogate approximates true layer contributions. 2. The method relies on a small calibration set, which may not adequately represent diverse data distributions or downstream task requirements. The resulting Shapley estimates might therefore be dataset-dependent and unstable across domains. 3. The study focuses solely on one-shot pruning
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
