Multi-rater delta: extending the delta nominal measure of agreement between two raters to many raters
A. Mart\'in Andr\'es, M. \'Alvarez Hern\'andez

TL;DR
This paper introduces a multi-rater delta coefficient extending the original delta measure from two raters to many, providing a more accurate, interpretable, and imbalance-resistant agreement measure for nominal classifications.
Contribution
It develops a new multi-rater delta coefficient that overcomes limitations of existing kappa-based measures, applicable to R ≥ 2 raters with improved interpretability and robustness.
Findings
The multi-rater delta coefficient is intuitive and easy to interpret.
It accurately measures agreement in each category without collapsing data.
The measure is unaffected by marginal imbalance.
Abstract
The need to measure the degree of agreement among R raters who independently classify n subjects within K nominal categories is frequent in many scientific areas. The most popular measures are Cohen's kappa (R = 2), Fleiss' kappa, Conger's kappa and Hubert's kappa (R 2) coefficients, which have several defects. In 2004, the delta coefficient was defined for the case of R = 2, which did not have the defects of Cohen's kappa coefficient. This article extends the coefficient delta from R = 2 raters to R 2. The coefficient multi-rater delta has the same advantages as the coefficient delta with regard to the type kappa coefficients: i) it is intuitive and easy to interpret, because it refers to the proportion of replies that are concordant and non random; ii) the summands which give its value allow the degree of agreement in each category to be measured accurately, with no need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
