Quantifying and Attributing Polarization to Annotator Groups
Dimitris Tsirmpas, John Pavlopoulos

TL;DR
This paper introduces a new metric and statistical test to quantify and attribute polarization among annotator groups, addressing limitations of existing agreement metrics especially in subjective tasks and imbalanced datasets.
Contribution
We propose a novel, scalable metric with significance testing for inter-group polarization analysis applicable to multi-label and imbalanced datasets, along with an open-source implementation.
Findings
Polarization is strongly linked to annotator race, especially in hate speech datasets.
Religious annotators differ from others, with trends changing over time.
Less educated annotators are more subjective, while educated ones show higher agreement.
Abstract
Current annotation agreement metrics are not well-suited for inter-group analysis, are sensitive to group size imbalances and restricted to single-annotation settings. These restrictions render them insufficient for many subjective tasks such as toxicity and hate-speech detection. For this reason, we introduce a quantifiable metric, paired with a statistical significance test, that attributes polarization to various annotator groups. Our metric enables direct comparisons between heavily imbalanced sociodemographic and ideological subgroups across different datasets and tasks, while also enabling analysis on multi-label settings. We apply this metric to three datasets on hate speech, and one on toxicity detection, discovering that: (1) Polarization is strongly and persistently attributed to annotator race, especially on the hate speech task. (2) Religious annotators do not fundamentally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Mobile Crowdsensing and Crowdsourcing · Authorship Attribution and Profiling
