Analyzing Dataset Annotation Quality Management in the Wild
Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

TL;DR
This paper investigates how natural language dataset creators manage quality, analyzing 591 publications to assess adherence to recommended practices and identify common errors in annotation quality management.
Contribution
It provides a large-scale analysis of quality management practices in natural language dataset creation, highlighting adherence levels and common issues in current research.
Findings
Majority follow good quality management practices
30% of works have subpar quality management
Common errors include issues with inter-annotator agreement
Abstract
Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, to our knowledge, large-scale analysis has yet to be performed on how quality management is conducted when creating natural language datasets and whether these recommendations are followed. Therefore, we first survey and summarize recommended quality management practices for dataset creation as described in the literature and provide suggestions for applying them. Then, we compile a corpus of 591 scientific publications introducing text datasets and annotate it for quality-related aspects,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsResearch Data Management Practices · Semantic Web and Ontologies · Scientific Computing and Data Management
