Retaining Key Information under High Compression Ratios: Query-Guided   Compressor for LLMs

Zhiwei Cao; Qian Cao; Yu Lu; Ningxin Peng; Luyang Huang; Shanbo Cheng,; Jinsong Su

arXiv:2406.02376·cs.CL·June 18, 2024

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng,, Jinsong Su

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a query-guided compression method for LLMs that preserves key information under high compression ratios, maintaining performance and reducing inference costs.

Contribution

The paper proposes the Query-Guided Compressor (QGC), a novel approach that uses queries to guide context compression, effectively retaining key information at high compression ratios.

Findings

01

QGC maintains high performance at high compression ratios.

02

QGC reduces inference cost and increases throughput.

03

Experimental validation on QA datasets confirms effectiveness.

Abstract

The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DeepLearnXMU/QGC
pytorchOfficial

Videos

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs· underline

Taxonomy

TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Algorithms and Data Compression