Split and Merge: Aligning Position Biases in LLM-based Evaluators

Zongjie Li; Chaozheng Wang; Pingchuan Ma; Daoyuan Wu; Shuai Wang,; Cuiyun Gao; Yang Liu

arXiv:2310.01432·cs.CL·December 10, 2024·2 cites

Split and Merge: Aligning Position Biases in LLM-based Evaluators

Zongjie Li, Chaozheng Wang, Pingchuan Ma, Daoyuan Wu, Shuai Wang,, Cuiyun Gao, Yang Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces PORTIA, a system that reduces position bias in LLM-based evaluators by splitting, aligning, and merging answers, significantly improving consistency and cost-efficiency in automated answer evaluation.

Contribution

PORTIA is a novel alignment-based method that calibrates position bias in LLM evaluators, enhancing their consistency and reducing costs compared to previous approaches.

Findings

01

PORTIA improves consistency rates by an average of 47.46% across models.

02

It enables less advanced models to match GPT-4's agreement at 10% of the cost.

03

It rectifies around 80% of position bias instances in GPT-4, raising its consistency to 98%.

Abstract

Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, these LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs. We conducted extensive experiments with six diverse LLMs to evaluate 11,520 answer pairs. Our results show that PORTIA markedly enhances the consistency rates for all the models and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Split and Merge: Aligning Position Biases in LLM-based Evaluators· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Linear Warmup With Cosine Annealing · Layer Normalization · Softmax