Finetuning LLMs for Comparative Assessment Tasks

Vatsal Raina; Adian Liusie; Mark Gales

arXiv:2409.15979·cs.CL·September 25, 2024

Finetuning LLMs for Comparative Assessment Tasks

Vatsal Raina, Adian Liusie, Mark Gales

PDF

Open Access

TL;DR

This paper introduces a finetuning framework for large language models to improve their efficiency and accuracy in comparative assessment tasks in natural language generation, addressing scalability issues of pairwise comparisons.

Contribution

It presents a novel finetuning method that aligns LLM outputs with target comparative probabilities, enhancing performance and efficiency over existing approaches.

Findings

01

Improved state-of-the-art performance in comparative assessment

02

Maintains high accuracy with fewer comparisons

03

Addresses scalability issues of pairwise comparisons

Abstract

Automated assessment in natural language generation is a challenging task. Instruction-tuned large language models (LLMs) have shown promise in reference-free evaluation, particularly through comparative assessment. However, the quadratic computational complexity of pairwise comparisons limits its scalability. To address this, efficient comparative assessment has been explored by applying comparative strategies on zero-shot LLM probabilities. We propose a framework for finetuning LLMs for comparative assessment to align the model's output with the target distribution of comparative probabilities. By training on soft probabilities, our approach improves state-of-the-art performance while maintaining high performance with an efficient subset of comparisons.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques

MethodsALIGN