EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh

TL;DR
EvoPress introduces an evolutionary approach to optimize dynamic, non-uniform compression of large language models, significantly reducing computational costs while maintaining accuracy across various models and compression techniques.
Contribution
It presents a novel evolutionary framework for dynamic LLM compression that outperforms existing methods and generalizes across multiple models and compression strategies.
Findings
Achieved state-of-the-art results on Llama, Mistral, and Phi models.
Set new benchmarks for structural pruning, sparsity, and quantization.
Demonstrated the effectiveness of evolutionary search in model compression.
Abstract
The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the importance of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics
MethodsPruning · Sparse Evolutionary Training
