DiPietro-Hazari Kappa: A Novel Metric for Assessing Labeling Quality via   Annotation

Daniel M. DiPietro; Vivek Hazari

arXiv:2209.08243·cs.LG·September 20, 2022

DiPietro-Hazari Kappa: A Novel Metric for Assessing Labeling Quality via Annotation

Daniel M. DiPietro, Vivek Hazari

PDF

Open Access 1 Repo

TL;DR

This paper introduces DiPietro-Hazari Kappa, a new statistical metric based on Fleiss's Kappa, designed to evaluate the quality of dataset labels in human annotation tasks, with theoretical and computational insights.

Contribution

The paper presents a novel metric, DiPietro-Hazari Kappa, that extends Fleiss's Kappa to better assess labeling quality in datasets, including theoretical foundations and implementation guidance.

Findings

01

The metric quantifies annotator agreement above random chance.

02

Theoretical analysis of Fleiss's Kappa informs the new metric.

03

Provides a matrix formulation and procedural instructions for computation.

Abstract

Data is a key component of modern machine learning, but statistics for assessing data label quality remain sparse in literature. Here, we introduce DiPietro-Hazari Kappa, a novel statistical metric for assessing the quality of suggested dataset labels in the context of human annotation. Rooted in the classical Fleiss's Kappa measure of inter-annotator agreement, the DiPietro-Hazari Kappa quantifies the the empirical annotator agreement differential that was attained above random chance. We offer a thorough theoretical examination of Fleiss's Kappa before turning to our derivation of DiPietro-Hazari Kappa. Finally, we conclude with a matrix formulation and set of procedural instructions for easy computational implementation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dandip/dh_kappa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Reliability and Agreement in Measurement · Sensory Analysis and Statistical Methods