TL;DR
EGGS-PTP is a novel structured post-training pruning method for large language models that uses expander graph theory to maintain information flow, resulting in efficient models with preserved accuracy.
Contribution
The paper introduces a graph-theory-based structured pruning approach that enhances model efficiency while maintaining performance in large language models.
Findings
Significant reduction in model size and computation
Outperforms existing pruning methods in accuracy
Effective preservation of model functionality
Abstract
As Large Language Models (LLMs) become more widely adopted and scale up in size, the computational and memory challenges involved in deploying these massive foundation models have grown increasingly severe. This underscores the urgent need to develop more efficient model variants. Faced with this challenge, the present work introduces EGGS-PTP: an Expander-Graph Guided Structured Post-training Pruning method. The proposed approach leverages graph theory to guide the design of N:M structured pruning, effectively reducing model size and computational demands. By incorporating concepts from expander graphs, EGGS-PTP ensures information flow within the pruned network, preserving essential model functionality. Extensive numerical experiments demonstrate that EGGS-PTP not only achieves significant acceleration and memory savings due to structured sparsity but also outperforms existing…
Peer Reviews
Decision·Submitted to ICLR 2026
1. EGGS-PTP introduces expander-graph theory into post-training pruning, presenting the first framework that applies expander graph concepts to large language model pruning. It innovatively leverages graph-theoretic properties such as connectivity and expansion to maintain robust information flow in pruned models. 2. It combines importance-aware and connectivity-aware pruning to balance compression efficiency and model accuracy. 3. The method enforces N:M structured sparsity compatible with GPU
1. EGGS-PTP mainly integrates expander-graph theory with existing pruning frameworks rather than introducing a fundamentally new learning mechanism, relying on heuristic rules instead of adaptive structures. 2. The method incurs higher pruning overhead than baselines like RIA, and its scalability beyond 34B-parameter models remains untested. 3. It depends on manual tuning of the hyperparameter (B), limiting automation and generalization across different architectures.
The additional diagonal selection leads to improvements over the RIA metric. Results are positive across perplexity and zero-shot task results, across several models.
Results are quite close to RIA; confidence intervals would help strengthen the claims of improved performance. The theory seems a bit disjointed. Why does it matter that we produce a two-sided expander? It is unclear what contribution this theory adds aside from some inspiration for the method. It would be nice to see either an improved explanation of why the expander graph theory is useful, or some further connections to claims made in the paper. For instance, if this framework improves infor
- The paper introduces an interesting perspective by connecting structured sparsity with expander graph theory, aiming to preserve information flow in pruned LLMs. - The authors evaluate on multiple LLMs and datasets, providing a broad empirical view. - The paper provides sufficient implementation details for reproduction.
- Overstated theoretical claims. - Insufficient ablation on graph hyperparameters.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
