PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation   Metrics

Daniil Larionov; Steffen Eger

arXiv:2412.16120·cs.CL·December 23, 2024

PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics

Daniil Larionov, Steffen Eger

PDF

Open Access 1 Video

TL;DR

This paper introduces a prompt compression method using a smaller fine-tuned model to reduce token usage and computational costs in LLM-based machine translation evaluation, maintaining accuracy.

Contribution

It presents a novel two-stage fine-tuning approach for prompt compression that improves efficiency without sacrificing evaluation quality.

Findings

01

2.37× reduction in token usage achieved

02

Maintains evaluation quality with compressed prompts

03

Enhances cost-effectiveness of LLM-based metrics

Abstract

Evaluating the quality of machine-generated natural language content is a challenging task in Natural Language Processing (NLP). Recently, large language models (LLMs) like GPT-4 have been employed for this purpose, but they are computationally expensive due to the extensive token usage required by complex evaluation prompts. In this paper, we propose a prompt optimization approach that uses a smaller, fine-tuned language model to compress input data for evaluation prompt, thus reducing token usage and computational cost when using larger LLMs for downstream evaluation. Our method involves a two-stage fine-tuning process: supervised fine-tuning followed by preference optimization to refine the model's outputs based on human preferences. We focus on Machine Translation (MT) evaluation and utilize the GEMBA-MQM metric as a starting point. Our results show a $2.37 \times$ reduction in token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics· underline

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Network Packet Processing and Optimization · Neural Networks and Applications

MethodsLinear Layer · Dense Connections · Residual Connection · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Dropout · Softmax