Improving ML Training Data with Gold-Standard Quality Metrics
Leslie Barrett, Michael W. Sherman

TL;DR
This paper introduces statistical methods to evaluate and improve the quality of hand-tagged training data in machine learning, emphasizing agreement metrics over multiple iterations to ensure higher data reliability.
Contribution
It proposes novel approaches for assessing and enhancing data quality without extensive re-tagging, including metrics for consistency and strategies for efficient high-quality data collection.
Findings
Agreement metrics improve reliability over multiple tagging iterations
Declining variance indicates increasing data quality
A tagging project can achieve high-quality data without multiple tags per item
Abstract
Hand-tagged training data is essential to many machine learning tasks. However, training data quality control has received little attention in the literature, despite data quality varying considerably with the tagging exercise. We propose methods to evaluate and enhance the quality of hand-tagged training data using statistical approaches to measure tagging consistency and agreement. We show that agreement metrics give more reliable results if recorded over multiple iterations of tagging, where declining variance in such recordings is an indicator of increasing data quality. We also show one way a tagging project can collect high-quality training data without requiring multiple tags for every work item, and that a tagger burn-in period may not be sufficient for minimizing tagger errors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReliability and Agreement in Measurement · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications
