LLMs can Compress LLMs: Adaptive Pruning by Agents

Sai Varun Kodathala; Rakesh Vunnam

arXiv:2601.09694·cs.CL·January 15, 2026

LLMs can Compress LLMs: Adaptive Pruning by Agents

Sai Varun Kodathala, Rakesh Vunnam

PDF

Open Access

TL;DR

This paper presents an adaptive, agent-guided pruning method for large language models that intelligently preserves critical knowledge pathways, significantly improving performance and knowledge retention at high sparsity levels without retraining.

Contribution

It introduces a novel agent-based pruning framework that combines sensitivity profiling with self-reflection, enabling effective, model-agnostic compression of LLMs while maintaining performance.

Findings

01

56% relative improvement in MMLU accuracy

02

19x better factual knowledge retention on FreebaseQA

03

69% lower perplexity degradation

Abstract

As Large Language Models (LLMs) continue to scale, post-training pruning has emerged as a promising approach to reduce computational costs while preserving performance. Existing methods such as SparseGPT and Wanda achieve high sparsity through layer-wise weight reconstruction or activation-aware magnitude pruning, but rely on uniform or hand-crafted heuristics to determine per-layer sparsity ratios. Moreover, recent work has shown that pruned LLMs suffer from severe factual knowledge degradation, with structured pruning methods experiencing near-total collapse in factual question-answering capabilities. We introduce agent-guided pruning, where a foundation model acts as an adaptive pruning agent to intelligently select which layers to prune at each iteration while preserving critical knowledge pathways. Our method constructs layer-wise sensitivity profiles by combining Wanda-inspired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques