LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic   Prompt Compression

Zhuoshi Pan; Qianhui Wu; Huiqiang Jiang; Menglin Xia; Xufang Luo; Jue; Zhang; Qingwei Lin; Victor R\"uhle; Yuqing Yang; Chin-Yew Lin; H. Vicky Zhao,; Lili Qiu; Dongmei Zhang

arXiv:2403.12968·cs.CL·August 13, 2024·2 cites

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue, Zhang, Qingwei Lin, Victor R\"uhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao,, Lili Qiu, Dongmei Zhang

PDF

Open Access 1 Repo 3 Models 4 Datasets 1 Video

TL;DR

This paper introduces a data distillation-based prompt compression method that enhances efficiency and faithfulness across various tasks and models by leveraging a token classification approach with a Transformer encoder.

Contribution

It proposes a novel data distillation technique for task-agnostic prompt compression, addressing limitations of entropy-based methods and improving generalization and speed.

Findings

01

Significant performance improvements over baselines.

02

Robust generalization across different LLMs.

03

Achieves 3x-6x faster inference with 2x-5x compression ratios.

Abstract

This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. Considering the redundancy in natural language, existing approaches compress prompts by removing tokens or lexical units according to their information entropy obtained from a causal language model such as LLaMa-7B. The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective. To address these issues, we propose a data distillation procedure to derive knowledge from an LLM to compress prompts without losing crucial information, and meantime, introduce an extractive text compression dataset. We formulate prompt compression as a token classification problem to guarantee the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/LLMLingua
pytorchOfficial

Models

Datasets

Videos

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression· underline

Taxonomy

TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression · Computational Physics and Python Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Softmax · Layer Normalization · Multi-Head Attention · Dropout · Residual Connection · Position-Wise Feed-Forward Layer · Byte Pair Encoding