Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
Alliot Nagle, Adway Girish, Marco Bondaschi, Michael Gastpar, Ashok, Vardhan Makkuva, Hyeji Kim

TL;DR
This paper establishes a theoretical framework for prompt compression in large language models, deriving fundamental limits and evaluating existing methods against these bounds, highlighting the importance of query-awareness.
Contribution
It introduces a rate-distortion framework for prompt compression, derives the optimal limit via linear programming, and proposes an adaptive, query-aware compression method to improve performance.
Findings
Current prompt compression methods are far from optimal.
Query-aware compression significantly improves prompt efficiency.
The proposed Adaptive QuerySelect reduces the gap to the theoretical limit.
Abstract
We formalize the problem of prompt compression for large language models (LLMs) and present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We derive the distortion-rate function for this setup as a linear program, and provide an efficient algorithm to compute this fundamental limit via the dual of the linear program. Using the distortion-rate function as the baseline, we study the performance of existing compression schemes on a synthetic dataset consisting of prompts generated from a Markov chain, natural language queries, and their respective answers. Our empirical analysis demonstrates the criticality of query-aware prompt compression, where the compressor has knowledge of the downstream task/query for the black-box LLM. We show that there is a large gap between the performance of current prompt compression methods and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques
