500xCompressor: Generalized Prompt Compression for Large Language Models

Zongqian Li; Yixuan Su; Nigel Collier

arXiv:2408.03094·cs.CL·August 7, 2024

500xCompressor: Generalized Prompt Compression for Large Language Models

Zongqian Li, Yixuan Su, Nigel Collier

PDF

Open Access 1 Repo

TL;DR

This paper introduces 500xCompressor, a novel prompt compression method that significantly reduces prompt size with minimal parameter increase, maintaining most of the LLM's capabilities across various datasets.

Contribution

The paper presents 500xCompressor, a universal prompt compression technique that achieves high compression ratios without fine-tuning, addressing limitations of existing methods.

Findings

01

Achieves compression ratios from 6x to 480x

02

Retains 62.26-72.89% of LLM capabilities

03

Utilizes K V values for better information preservation

Abstract

Prompt compression is crucial for enhancing inference speed, reducing costs, and improving user experience. However, current methods face challenges such as low compression ratios and potential data leakage during evaluation. To address these issues, we propose 500xCompressor, a method that compresses extensive natural language contexts into a minimum of one single special token. The 500xCompressor introduces approximately 0.3% additional parameters and achieves compression ratios ranging from 6x to 480x. It is designed to compress any text, answer various types of questions, and could be utilized by the original large language model (LLM) without requiring fine-tuning. Initially, 500xCompressor was pretrained on the Arxiv Corpus, followed by fine-tuning on the ArxivQA dataset, and subsequently evaluated on strictly unseen and classical question answering (QA) datasets. The results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZongqianLi/500xCompressor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression