500xCompressor: Generalized Prompt Compression for Large Language Models
Zongqian Li, Yixuan Su, Nigel Collier

TL;DR
This paper introduces 500xCompressor, a novel prompt compression method that significantly reduces prompt size with minimal parameter increase, maintaining most of the LLM's capabilities across various datasets.
Contribution
The paper presents 500xCompressor, a universal prompt compression technique that achieves high compression ratios without fine-tuning, addressing limitations of existing methods.
Findings
Achieves compression ratios from 6x to 480x
Retains 62.26-72.89% of LLM capabilities
Utilizes K V values for better information preservation
Abstract
Prompt compression is crucial for enhancing inference speed, reducing costs, and improving user experience. However, current methods face challenges such as low compression ratios and potential data leakage during evaluation. To address these issues, we propose 500xCompressor, a method that compresses extensive natural language contexts into a minimum of one single special token. The 500xCompressor introduces approximately 0.3% additional parameters and achieves compression ratios ranging from 6x to 480x. It is designed to compress any text, answer various types of questions, and could be utilized by the original large language model (LLM) without requiring fine-tuning. Initially, 500xCompressor was pretrained on the Arxiv Corpus, followed by fine-tuning on the ArxivQA dataset, and subsequently evaluated on strictly unseen and classical question answering (QA) datasets. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
