Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs
Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng,, Jinsong Su

TL;DR
This paper introduces a query-guided compression method for LLMs that preserves key information under high compression ratios, maintaining performance and reducing inference costs.
Contribution
The paper proposes the Query-Guided Compressor (QGC), a novel approach that uses queries to guide context compression, effectively retaining key information at high compression ratios.
Findings
QGC maintains high performance at high compression ratios.
QGC reduces inference cost and increases throughput.
Experimental validation on QA datasets confirms effectiveness.
Abstract
The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Algorithms and Data Compression
