ProCut: LLM Prompt Compression via Attribution Estimation

Zhentao Xu; Fengyi Li; Albert Chen; Xiaofeng Wang

arXiv:2508.02053·cs.CL·October 9, 2025

ProCut: LLM Prompt Compression via Attribution Estimation

Zhentao Xu, Fengyi Li, Albert Chen, Xiaofeng Wang

PDF

Open Access 1 Video

TL;DR

ProCut is a prompt compression method for large language models that uses attribution analysis to reduce prompt size significantly while maintaining or improving task performance, thereby reducing costs and complexity.

Contribution

ProCut introduces a training-free, attribution-based prompt compression framework that is LLM-agnostic and effective across multiple benchmarks and real-world prompts.

Findings

01

Achieves 78% token reduction in production prompts.

02

Maintains or improves task performance up to 62%.

03

Reduces compression latency by over 50%.

Abstract

In large-scale industrial LLM systems, prompt templates often expand to thousands of tokens as teams iteratively incorporate sections such as task instructions, few-shot examples, and heuristic rules to enhance robustness and coverage. This expansion leads to bloated prompts that are difficult to maintain and incur significant inference latency and serving costs. To address this, we introduce Prompt Compression via Attribution Estimation (ProCut), a flexible, LLM-agnostic, training-free framework that compresses prompts through attribution analysis. ProCut segments prompt templates into semantically meaningful units, quantifies their impact on task performance, and prunes low-utility components. Through extensive experiments on five public benchmark datasets and real-world industrial prompts, we show that ProCut achieves substantial prompt size reductions (78% fewer tokens in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ProCut: LLM Prompt Compression via Attribution Estimation· underline

Taxonomy

TopicsSoftware System Performance and Reliability · Data Quality and Management · Topic Modeling