Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Martin Genzel, Patrick Putzky, Pengfei Zhao, Sebastian Schulze, Mattes Mollenhauer, Robert Seidel, Stefan Dietzel, Thomas Wollmann

TL;DR
This paper introduces ACIP, a novel post-training compression method for large language models that enables flexible size reduction without re-computation, balancing efficiency and performance.
Contribution
ACIP is a new algorithm that determines optimal compression-performance trade-offs from a single training run, using iterative pruning and SVD reparametrization.
Findings
Achieves state-of-the-art compression results on open-weight LLMs.
Allows compression to any target size without re-computation.
Complements existing quantization techniques.
Abstract
The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MerantixMomentum/acip_llama1_7bmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗MerantixMomentum/acip_llama1_13bmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗MerantixMomentum/acip_llama2_7bmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗MerantixMomentum/acip_llama2_13bmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗MerantixMomentum/acip_llama31_8bmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗MerantixMomentum/acip_mistral03_7bmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗MerantixMomentum/acip_qwen25_3bmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗MerantixMomentum/acip_qwen25_7bmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗MerantixMomentum/acip_qwen25_14bmodel· 4 dl· ♡ 14 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Simulation Techniques and Applications · Data Visualization and Analytics
MethodsPruning
