TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement   Learning

Shivam Shandilya; Menglin Xia; Supriyo Ghosh; Huiqiang Jiang; Jue; Zhang; Qianhui Wu; Victor R\"uhle

arXiv:2409.13035·cs.CL·December 19, 2024·2 cites

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning

Shivam Shandilya, Menglin Xia, Supriyo Ghosh, Huiqiang Jiang, Jue, Zhang, Qianhui Wu, Victor R\"uhle

PDF

Open Access

TL;DR

This paper introduces TACO-RL, a reinforcement learning approach for task-aware prompt compression that significantly improves performance across multiple NLP tasks while maintaining efficiency.

Contribution

The paper presents a novel RL-based prompt compression method that incorporates task-specific rewards, outperforming existing techniques in diverse NLP applications.

Findings

01

Improves task performance by 8%-189% over state-of-the-art methods.

02

Effectively balances compression rate and latency requirements.

03

Demonstrates versatility across summarization, question answering, and code summarization.

Abstract

The increasing prevalence of large language models (LLMs) such as GPT-4 in various applications has led to a surge in the size of prompts required for optimal performance, leading to challenges in computational efficiency. Prompt compression aims to reduce the inference cost by minimizing input tokens without compromising on the task performance. However, existing prompt compression techniques either rely on sub-optimal metrics such as information entropy or model it as a task-agnostic token classification problem that fails to capture task-specific information. To address these issues, we propose a novel and efficient reinforcement learning (RL) based task-aware prompt compression method. To ensure low latency requirements, we leverage existing Transformer encoder-based token classification model while guiding the learning process with task-specific reward signals using lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections