Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models
Krishnakumar Balasubramanian, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan

TL;DR
This paper introduces a dependence-aware label aggregation method using Ising models to improve the accuracy of large language model judges by accounting for dependencies among annotators.
Contribution
It develops a novel dependence-aware aggregation framework based on Ising models, addressing limitations of classical methods that assume annotator independence.
Findings
Dependence-aware models outperform classical methods on real datasets.
Ignoring dependencies can lead to miscalibrated posteriors and incorrect labels.
The proposed method achieves improved accuracy over traditional baselines.
Abstract
Large-scale AI evaluation increasingly relies on aggregating binary judgments from annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label , an assumption often violated by LLM judges due to shared data, architectures, prompts, and failure modes. Ignoring such dependencies can yield miscalibrated posteriors and even confidently incorrect predictions. We study label aggregation through a hierarchy of dependence-aware models based on Ising graphical models and latent factors. For class-dependent Ising models, the Bayes log-odds is generally quadratic in votes; for class-independent couplings, it reduces to a linear weighted vote with correlation-adjusted parameters. We present finite- examples showing that methods based on conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
