DarwinLM: Evolutionary Structured Pruning of Large Language Models

Shengkun Tang; Oliver Sieberling; Eldar Kurtic; Zhiqiang Shen; Dan; Alistarh

arXiv:2502.07780·cs.LG·March 6, 2025

DarwinLM: Evolutionary Structured Pruning of Large Language Models

Shengkun Tang, Oliver Sieberling, Eldar Kurtic, Zhiqiang Shen, Dan, Alistarh

PDF

Open Access 1 Repo 6 Models

TL;DR

DarwinLM introduces an evolutionary, training-aware structured pruning method for large language models, optimizing model size and speed while maintaining high performance, validated on multiple LLMs with state-of-the-art results.

Contribution

The paper presents DarwinLM, a novel evolutionary structured pruning approach that incorporates post-training adaptation, significantly improving efficiency and performance over existing methods.

Findings

01

Achieves state-of-the-art structured pruning performance on Llama and Qwen models.

02

Requires 5x less training data during post-compression training compared to previous methods.

03

Effectively balances model compression with minimal performance loss.

Abstract

Large Language Models (LLMs) have achieved significant success across various NLP tasks. However, their massive computational costs limit their widespread use, particularly in real-time applications. Structured pruning offers an effective solution by compressing models and directly providing end-to-end speed improvements, regardless of the hardware environment. Meanwhile, different components of the model exhibit varying sensitivities towards pruning, calling for non-uniform model compression. However, a pruning method should not only identify a capable substructure, but also account for post-compression training. To this end, we propose DarwinLM, a method for training-aware structured pruning. DarwinLM builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. To assess the effect of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IST-DASLab/DarwinLM
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings