DarwinLM: Evolutionary Structured Pruning of Large Language Models
Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan, Alistarh

TL;DR
DarwinLM introduces an evolutionary, training-aware structured pruning method for large language models, optimizing model size and speed while maintaining high performance, validated on multiple LLMs with state-of-the-art results.
Contribution
The paper presents DarwinLM, a novel evolutionary structured pruning approach that incorporates post-training adaptation, significantly improving efficiency and performance over existing methods.
Findings
Achieves state-of-the-art structured pruning performance on Llama and Qwen models.
Requires 5x less training data during post-compression training compared to previous methods.
Effectively balances model compression with minimal performance loss.
Abstract
Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective solution by compressing models and directly providing end-to-end speed improvements, regardless of the hardware environment. Meanwhile, different components of the model exhibit varying sensitivities towards pruning, calling for non-uniform model compression. However, a pruning method should not only identify a capable substructure, but also account for post-compression training. To this end, we propose DarwinLM, a method for training-aware structured pruning. DarwinLM builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. To assess the effect of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
