Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation
Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger

TL;DR
This paper critiques the reliance on human inter-rater reliability metrics for educational AI annotation, advocating for alternative validation methods that better ensure data quality and educational impact.
Contribution
It introduces and advocates for complementary evaluation approaches beyond IRR to improve the validity and educational relevance of annotated data in AI systems.
Findings
IRR overreliance hampers valid data classification
Proposes multi-label and expert-based validation methods
Highlights importance of external validity across categories
Abstract
Humans can be notoriously imperfect evaluators. They are often biased, unreliable, and unfit to define "ground truth." Yet, given the surging need to produce large amounts of training data in educational applications using AI, traditional inter-rater reliability (IRR) metrics like Cohen's kappa remain central to validating labeled data. IRR remains a cornerstone of many machine learning pipelines for educational data. Take, for example, the classification of tutors' moves in dialogues or labeling open responses in machine-graded assessments. This position paper argues that overreliance on human IRR as a gatekeeper for annotation quality hampers progress in classifying data in ways that are valid and predictive in relation to improving learning. To address this issue, we highlight five examples of complementary evaluation methods, such as multi-label annotation schemes, expert-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Reliability and Agreement in Measurement · Explainable Artificial Intelligence (XAI)
