Split and Merge: Aligning Position Biases in LLM-based Evaluators
Zongjie Li, Chaozheng Wang, Pingchuan Ma, Daoyuan Wu, Shuai Wang,, Cuiyun Gao, Yang Liu

TL;DR
This paper introduces PORTIA, a system that reduces position bias in LLM-based evaluators by splitting, aligning, and merging answers, significantly improving consistency and cost-efficiency in automated answer evaluation.
Contribution
PORTIA is a novel alignment-based method that calibrates position bias in LLM evaluators, enhancing their consistency and reducing costs compared to previous approaches.
Findings
PORTIA improves consistency rates by an average of 47.46% across models.
It enables less advanced models to match GPT-4's agreement at 10% of the cost.
It rectifies around 80% of position bias instances in GPT-4, raising its consistency to 98%.
Abstract
Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, these LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, aligns similar content across candidate answers, and then merges them back into a single prompt for evaluation by LLMs. We conducted extensive experiments with six diverse LLMs to evaluate 11,520 answer pairs. Our results show that PORTIA markedly enhances the consistency rates for all the models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Linear Warmup With Cosine Annealing · Layer Normalization · Softmax
