Bridging Human and LLM Judgments: Understanding and Narrowing the Gap

Felipe Maia Polo; Xinhe Wang; Mikhail Yurochkin; Gongjun Xu; Moulinath Banerjee; Yuekai Sun

arXiv:2508.12792·cs.LG·December 3, 2025

Bridging Human and LLM Judgments: Understanding and Narrowing the Gap

Felipe Maia Polo, Xinhe Wang, Mikhail Yurochkin, Gongjun Xu, Moulinath Banerjee, Yuekai Sun

PDF

Open Access 1 Video

TL;DR

This paper introduces Bridge, a statistical framework that aligns LLM evaluations with human judgments, improving agreement and revealing systematic differences between them.

Contribution

Bridge provides a novel, unified method to model and correct discrepancies between human and LLM evaluations, enhancing the reliability of LLM-based judging.

Findings

01

Bridge improves agreement with human ratings across multiple benchmarks.

02

It exposes systematic gaps between human and LLM judgments.

03

The framework offers efficient, statistically sound inference methods.

Abstract

Large language models are increasingly used as judges (LLM-as-a-judge) to evaluate model outputs at scale, but their assessments often diverge systematically from human judgments. We present Bridge, a unified statistical framework that explicitly bridges human and LLM evaluations under both absolute scoring and pairwise comparison paradigms. Bridge posits a latent human preference score for each prompt-response pair and models LLM deviations as linear transformations of covariates that capture sources of discrepancies. This offers a simple and principled framework for refining LLM ratings and characterizing systematic discrepancies between humans and LLMs. We provide an efficient fitting algorithm with asymptotic guarantees for statistical inference. Using six LLM judges and two benchmarks (BigGen Bench and Chatbot Arena), Bridge achieves higher agreement with human ratings (accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bridging Human and LLM Judgments: Understanding and Narrowing the Gap· slideslive

Taxonomy

TopicsCorporate Governance and Law · Taxation and Legal Issues · Conflict of Laws and Jurisdiction