Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong, Zhenheng Tang, Xiang Liu, Lujun Li, Xiaowen Chu, Bo Li

TL;DR
This paper introduces ACBench, a comprehensive benchmark to evaluate how different compression techniques affect large language models' agentic capabilities such as workflow generation, tool use, and real-world tasks.
Contribution
It presents the first benchmark specifically assessing the impact of compression on LLMs' agentic abilities across multiple tasks and models, revealing important tradeoffs.
Findings
4-bit quantization preserves some agentic abilities with minimal drop
Compression degrades real-world application accuracy by 10-15%
ACBench offers actionable insights for optimizing LLM compression
Abstract
Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCredit Risk and Financial Regulations · Magnetic properties of thin films · Advanced Data Storage Technologies
MethodsPruning · Focus
