Liberal-Conservative Hierarchies of Intercoder Reliability Estimators

Yingjie Jay Zhao (1); Guangchao Charles Feng (2); Dianshi Moses Li; (3); Song Harris Ao (4); Ming Milano Li (5); Zhan Thor Tuo (3); Hui Huang; (6); Ke Deng (7); Xinshu Zhao (4) ((1) Center for Data Science; Institute of; Collaborative Innovation; University of Macau; Taipa; Macao; (2) Department; of Interactive Media; School of Communication; Hong Kong Baptist University,; Kowloon; Hong Kong; (3) Centre for Empirical Legal Study; Faculty of Law,; University of Macau; Taipa; Macao; (4) Department of Communication; Faculty; of Social Sciences; University of Macau; Taipa; Macao; (5) Department of; Government; Public Administration; Faculty of Social Sciences; University; of Macau; Taipa; Macao; (6) Tenly Inc.; Shanghai; China; (7) Department of; Statistics & Data Science; Tsinghua University; Beijing; China)

arXiv:2410.05291·physics.soc-ph·October 29, 2024·3 cites

Liberal-Conservative Hierarchies of Intercoder Reliability Estimators

Yingjie Jay Zhao (1), Guangchao Charles Feng (2), Dianshi Moses Li, (3), Song Harris Ao (4), Ming Milano Li (5), Zhan Thor Tuo (3), Hui Huang, (6), Ke Deng (7), Xinshu Zhao (4) ((1) Center for Data Science, Institute of, Collaborative Innovation, University of Macau, Taipa

PDF

Open Access

TL;DR

This paper critically examines common inter-coder reliability indices, revealing their limitations and proposing a hierarchy of 23 indices through mathematical analysis and simulations to guide better index selection.

Contribution

It extends the hierarchy of reliability indices to 23 measures and uncovers a new paradox, aiding informed index selection in research.

Findings

01

Index scores vary systematically across the hierarchy.

02

The Ir index exhibits a previously unknown paradox.

03

Simulations show how index performance depends on categories and distribution.

Abstract

While numerous indices of inter-coder reliability exist, Krippendorff's {\alpha} and Cohen's \{kappa} have long dominated in communication studies and other fields, respectively. The near consensus, however, may be near the end. Recent theoretical and mathematical analyses reveal that these indices assume intentional and maximal random coding, leading to paradoxes and inaccuracies. A controlled experiment with one-way golden standard and Monte Carlo simulations supports these findings, showing that \{kappa} and {\alpha} are poor predictors and approximators of true intercoder reliability. As consensus on a perfect index remains elusive, more authors recommend selecting the best available index for specific situations (BAFS). To make informed choices, researchers, reviewers, and educators need to understand the liberal-conservative hierarchy of indices, i.e., which indices produce higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems