Fundamental Limits of Prompt Compression: A Rate-Distortion Framework   for Black-Box Language Models

Alliot Nagle; Adway Girish; Marco Bondaschi; Michael Gastpar; Ashok; Vardhan Makkuva; Hyeji Kim

arXiv:2407.15504·cs.LG·December 12, 2024·2 cites

Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

Alliot Nagle, Adway Girish, Marco Bondaschi, Michael Gastpar, Ashok, Vardhan Makkuva, Hyeji Kim

PDF

Open Access 1 Video

TL;DR

This paper establishes a theoretical framework for prompt compression in large language models, deriving fundamental limits and evaluating existing methods against these bounds, highlighting the importance of query-awareness.

Contribution

It introduces a rate-distortion framework for prompt compression, derives the optimal limit via linear programming, and proposes an adaptive, query-aware compression method to improve performance.

Findings

01

Current prompt compression methods are far from optimal.

02

Query-aware compression significantly improves prompt efficiency.

03

The proposed Adaptive QuerySelect reduces the gap to the theoretical limit.

Abstract

We formalize the problem of prompt compression for large language models (LLMs) and present a framework to unify token-level prompt compression methods which create hard prompts for black-box models. We derive the distortion-rate function for this setup as a linear program, and provide an efficient algorithm to compute this fundamental limit via the dual of the linear program. Using the distortion-rate function as the baseline, we study the performance of existing compression schemes on a synthetic dataset consisting of prompts generated from a Markov chain, natural language queries, and their respective answers. Our empirical analysis demonstrates the criticality of query-aware prompt compression, where the compressor has knowledge of the downstream task/query for the black-box LLM. We show that there is a large gap between the performance of current prompt compression methods and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models· slideslive

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques