Liberal-Conservative Hierarchies of Intercoder Reliability Estimators
Yingjie Jay Zhao (1), Guangchao Charles Feng (2), Dianshi Moses Li, (3), Song Harris Ao (4), Ming Milano Li (5), Zhan Thor Tuo (3), Hui Huang, (6), Ke Deng (7), Xinshu Zhao (4) ((1) Center for Data Science, Institute of, Collaborative Innovation, University of Macau, Taipa

TL;DR
This paper critically examines common inter-coder reliability indices, revealing their limitations and proposing a hierarchy of 23 indices through mathematical analysis and simulations to guide better index selection.
Contribution
It extends the hierarchy of reliability indices to 23 measures and uncovers a new paradox, aiding informed index selection in research.
Findings
Index scores vary systematically across the hierarchy.
The Ir index exhibits a previously unknown paradox.
Simulations show how index performance depends on categories and distribution.
Abstract
While numerous indices of inter-coder reliability exist, Krippendorff's {\alpha} and Cohen's \{kappa} have long dominated in communication studies and other fields, respectively. The near consensus, however, may be near the end. Recent theoretical and mathematical analyses reveal that these indices assume intentional and maximal random coding, leading to paradoxes and inaccuracies. A controlled experiment with one-way golden standard and Monte Carlo simulations supports these findings, showing that \{kappa} and {\alpha} are poor predictors and approximators of true intercoder reliability. As consensus on a perfect index remains elusive, more authors recommend selecting the best available index for specific situations (BAFS). To make informed choices, researchers, reviewers, and educators need to understand the liberal-conservative hierarchy of indices, i.e., which indices produce higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
