Better to be in agreement than in bad company: a critical analysis of many kappa-like tests assessing one-million 2x2 contingency tables
Paulo Sergio Panse Silveira, Jose Oliveira Siqueira

TL;DR
This study critically evaluates various agreement coefficients for 2x2 contingency tables, introduces a general assessment method, and identifies Holley and Guilford's G as the most reliable estimators, providing open-source R tools for researchers.
Contribution
Developed a comprehensive method to evaluate agreement estimators in 2x2 tables and identified the most reliable coefficients, with open-source implementation.
Findings
Holley and Guilford's G are the most reliable agreement estimators.
Many traditional coefficients perform poorly in imbalanced tables.
Open-source R code for assessment is publicly available.
Abstract
We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dicotomization by the conditions of the subjects (e.g., male or female) or by conveniency of the classification (e.g., traditional thresholds leading to separations in healthy or diseased, exposed or non-exposed, etc.). More extreme table configurations (e.g., high agreement between raters) are also usual, but some of the coefficients have problems with imbalanced tables. Here, we not only studied some especific estimators, but also developed a general method to the study for any estimator candidate to be an agreement measurement. This method was developed in open source R codes and it is avaliable to the researchers. Here, we tested this method by verifying the performance of several traditional estimators over all 1,028,789 tables with size ranging from 1 to 68.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReliability and Agreement in Measurement · Psychometric Methodologies and Testing
