AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents

Zuhair Ahmed Khan Taha; Mohammed Mudassir Uddin; Shahnawaz Alam

arXiv:2601.05191·cs.CV·January 13, 2026

AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents

Zuhair Ahmed Khan Taha, Mohammed Mudassir Uddin, Shahnawaz Alam

PDF

Open Access

TL;DR

AgentCompress is a task-aware dynamic compression framework that routes language model queries to different model sizes based on task complexity, significantly reducing costs while maintaining high accuracy across diverse domains.

Contribution

It introduces a lightweight neural controller for task-aware model routing, enabling cost-effective use of large language models without sacrificing performance.

Findings

01

68.3% reduction in computational costs

02

96.2% of original success rate preserved

03

Effective across multiple scientific domains

Abstract

Large language models hold considerable promise for various applications, but their computational requirements create a barrier that many institutions cannot overcome. A single session using a 70-billion-parameter model can cost around $127 in cloud computing fees, which puts these tools out of reach for organizations operating on limited budgets. We present AgentCompress, a framework that tackles this problem through task-aware dynamic compression. The idea comes from a simple observation: not all tasks require the same computational effort. Complex reasoning, for example, is far more demanding than text reformatting, yet conventional compression applies the same reduction to both. Our approach uses a lightweight neural controller that looks at the first few tokens of each request, estimates how complex the task will be, and sends it to an appropriately quantized version of the model.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Scientific Computing and Data Management · Big Data and Digital Economy