SCOPE: A Generative Approach for LLM Prompt Compression
Tinghui Zhang, Yifan Wang, Daisy Zhe Wang

TL;DR
This paper introduces SCOPE, a novel generative prompt compression method that uses semantic chunking and summarization to effectively shorten prompts while preserving quality, outperforming existing token removal techniques.
Contribution
The paper presents a new chunking-and-summarization approach for prompt compression, addressing limitations of token removal methods and enhancing information preservation and coherence.
Findings
Achieves better compression quality than state-of-the-art methods.
Maintains higher stability under high compression ratios.
Proves effectiveness across multiple domain datasets.
Abstract
Prompt compression methods enhance the efficiency of Large Language Models (LLMs) and minimize the cost by reducing the length of input context. The goal of prompt compression is to shorten the LLM prompt while maintaining a high generation quality. However, existing solutions, mainly based on token removal, face challenges such as information loss and structural incoherence, like missing grammar elements in a sentence, or incomplete word phrases after token removal. Such challenges limit the final generation quality of LLM. To overcome these limitations, we present a novel generative prompt compression method. Unlike the existing token removal methods, our method centers at a chunking-and-summarization mechanism. Specifically, our method splits prompt into semantically coherent chunks and rewrites the chunks to be more concise. The chunks are reconstructed into meaningful prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
