AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents
Zuhair Ahmed Khan Taha, Mohammed Mudassir Uddin, Shahnawaz Alam

TL;DR
AgentCompress is a task-aware dynamic compression framework that routes language model queries to different model sizes based on task complexity, significantly reducing costs while maintaining high accuracy across diverse domains.
Contribution
It introduces a lightweight neural controller for task-aware model routing, enabling cost-effective use of large language models without sacrificing performance.
Findings
68.3% reduction in computational costs
96.2% of original success rate preserved
Effective across multiple scientific domains
Abstract
Large language models hold considerable promise for various applications, but their computational requirements create a barrier that many institutions cannot overcome. A single session using a 70-billion-parameter model can cost around $127 in cloud computing fees, which puts these tools out of reach for organizations operating on limited budgets. We present AgentCompress, a framework that tackles this problem through task-aware dynamic compression. The idea comes from a simple observation: not all tasks require the same computational effort. Complex reasoning, for example, is far more demanding than text reformatting, yet conventional compression applies the same reduction to both. Our approach uses a lightweight neural controller that looks at the first few tokens of each request, estimates how complex the task will be, and sends it to an appropriately quantized version of the model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Scientific Computing and Data Management · Big Data and Digital Economy
