Prompt Compression for Large Language Models: A Survey

Zongqian Li; Yinhong Liu; Yixuan Su; Nigel Collier

arXiv:2410.12388·cs.CL·October 18, 2024

Prompt Compression for Large Language Models: A Survey

Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier

PDF

Open Access 2 Repos 1 Video

TL;DR

This survey reviews prompt compression techniques for large language models, categorizing methods, analyzing their mechanisms, and discussing future research directions to reduce memory and inference costs.

Contribution

It provides a comprehensive overview and comparison of hard and soft prompt compression methods, including their mechanisms and potential future improvements.

Findings

01

Prompt compression reduces memory and inference costs.

02

Different techniques include attention optimization and PEFT.

03

Future directions involve combining methods and multimodal insights.

Abstract

Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. This survey provides an overview of prompt compression techniques, categorized into hard prompt methods and soft prompt methods. First, the technical approaches of these methods are compared, followed by an exploration of various ways to understand their mechanisms, including the perspectives of attention optimization, Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new synthetic language. We also examine the downstream adaptations of various prompt compression techniques. Finally, the limitations of current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Prompt Compression for Large Language Models: A Survey· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression

MethodsSoftmax · Attention Is All You Need