NUTMEG: Separating Signal From Noise in Annotator Disagreement

Jonathan Ivey; Susan Gauch; and David Jurgens

arXiv:2507.18890·cs.CL·July 28, 2025

NUTMEG: Separating Signal From Noise in Annotator Disagreement

Jonathan Ivey, Susan Gauch, and David Jurgens

PDF

Open Access 1 Video

TL;DR

NUTMEG is a Bayesian model that effectively distinguishes between genuine systematic disagreements and noisy annotations in crowdsourced NLP data, improving ground-truth recovery and downstream model performance.

Contribution

It introduces NUTMEG, a novel Bayesian approach that incorporates annotator backgrounds to separate signal from noise in disagreement data.

Findings

01

NUTMEG outperforms traditional aggregation in recovering ground-truth.

02

Models trained on NUTMEG-processed data perform better.

03

Systematic disagreements can be preserved while noisy annotations are removed.

Abstract

NLP models often rely on human-labeled data for training and evaluation. Many approaches crowdsource this data from a large number of annotators with varying skills, backgrounds, and motivations, resulting in conflicting annotations. These conflicts have traditionally been resolved by aggregation methods that assume disagreements are errors. Recent work has argued that for many tasks annotators may have genuine disagreements and that variation should be treated as signal rather than noise. However, few models separate signal and noise in annotator disagreement. In this work, we introduce NUTMEG, a new Bayesian model that incorporates information about annotator backgrounds to remove noisy annotations from human-labeled training data while preserving systematic disagreements. Using synthetic data, we show that NUTMEG is more effective at recovering ground-truth from annotations with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

NUTMEG: Separating Signal From Noise in Annotator Disagreement· underline

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI · Topic Modeling