Characterizing Prompt Compression Methods for Long Context Inference
Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir, Gholami

TL;DR
This paper systematically compares various prompt compression techniques for long context inference, revealing extractive compression as the most effective method with significant compression and minimal accuracy loss.
Contribution
It provides a comprehensive evaluation of prompt compression methods, highlighting the superior performance of extractive compression over other approaches across multiple tasks.
Findings
Extractive compression often outperforms other methods.
Up to 10x prompt compression with minimal accuracy loss.
Token pruning methods generally underperform compared to extractive compression.
Abstract
Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks through a standardized analysis. This has led to conflicting results. To address this, here we perform a comprehensive characterization and evaluation of different prompt compression methods. In particular, we analyze extractive compression, summarization-based abstractive compression, and token pruning methods. Surprisingly, we find that extractive compression often outperforms all the other approaches, and enables up to 10x compression with minimal accuracy degradation. Interestingly, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Data Compression Techniques · Time Series Analysis and Forecasting
MethodsPruning
