EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

Oliver Sieberling; Denis Kuznedelev; Eldar Kurtic; Dan Alistarh

arXiv:2410.14649·cs.LG·July 2, 2025

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh

PDF

Open Access 1 Repo

TL;DR

EvoPress introduces an evolutionary approach to optimize dynamic, non-uniform compression of large language models, significantly reducing computational costs while maintaining accuracy across various models and compression techniques.

Contribution

It presents a novel evolutionary framework for dynamic LLM compression that outperforms existing methods and generalizes across multiple models and compression strategies.

Findings

01

Achieved state-of-the-art results on Llama, Mistral, and Phi models.

02

Set new benchmarks for structural pruning, sparsity, and quantization.

03

Demonstrated the effectiveness of evolutionary search in model compression.

Abstract

The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the importance of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ist-daslab/evopress
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics

MethodsPruning · Sparse Evolutionary Training