Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

Krishnakumar Balasubramanian; Aleksandr Podkopaev; Shiva Prasad Kasiviswanathan

arXiv:2601.22336·stat.ML·February 2, 2026

Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

Krishnakumar Balasubramanian, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan

PDF

Open Access

TL;DR

This paper introduces a dependence-aware label aggregation method using Ising models to improve the accuracy of large language model judges by accounting for dependencies among annotators.

Contribution

It develops a novel dependence-aware aggregation framework based on Ising models, addressing limitations of classical methods that assume annotator independence.

Findings

01

Dependence-aware models outperform classical methods on real datasets.

02

Ignoring dependencies can lead to miscalibrated posteriors and incorrect labels.

03

The proposed method achieves improved accuracy over traditional baselines.

Abstract

Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Y \in {0, 1}$ , an assumption often violated by LLM judges due to shared data, architectures, prompts, and failure modes. Ignoring such dependencies can yield miscalibrated posteriors and even confidently incorrect predictions. We study label aggregation through a hierarchy of dependence-aware models based on Ising graphical models and latent factors. For class-dependent Ising models, the Bayes log-odds is generally quadratic in votes; for class-independent couplings, it reduces to a linear weighted vote with correlation-adjusted parameters. We present finite- $K$ examples showing that methods based on conditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)