Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis
Xi Long, Christy Boscardin, Lauren A. Maggio, Joseph A. Costello, Ralph Gonzales, Rasmyah Hammoudeh, Ki Lai, Yoon Soo Park, Brian C. Gin

TL;DR
This study evaluates AI-assisted data extraction in literature reviews, highlighting that AI variability is mainly due to interpretive differences rather than hallucinations, and proposes methods to improve reliability.
Contribution
It introduces a platform using large language models for data extraction and compares AI to human responses, revealing insights into AI accuracy and interpretability in knowledge synthesis.
Findings
AI is highly consistent with humans on explicit questions
Interpretive differences account for most AI-human discordance
AI inaccuracies are rare compared to human errors
Abstract
Knowledge syntheses (literature reviews) are essential to health professions education (HPE), consolidating findings to advance theory and practice. However, they are labor-intensive, especially during data extraction. Artificial Intelligence (AI)-assisted extraction promises efficiency but raises concerns about accuracy, making it critical to distinguish AI 'hallucinations' (fabricated content) from legitimate interpretive differences. We developed an extraction platform using large language models (LLMs) to automate data extraction and compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review. AI-human, human-human, and AI-AI consistencies were measured using interrater reliability (categorical) and thematic similarity ratings (open-ended). Errors were identified by comparing extracted responses to source publications. AI was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
