Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis

Xi Long; Christy Boscardin; Lauren A. Maggio; Joseph A. Costello; Ralph Gonzales; Rasmyah Hammoudeh; Ki Lai; Yoon Soo Park; Brian C. Gin

arXiv:2508.09458·cs.HC·August 15, 2025

Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis

Xi Long, Christy Boscardin, Lauren A. Maggio, Joseph A. Costello, Ralph Gonzales, Rasmyah Hammoudeh, Ki Lai, Yoon Soo Park, Brian C. Gin

PDF

TL;DR

This study evaluates AI-assisted data extraction in literature reviews, highlighting that AI variability is mainly due to interpretive differences rather than hallucinations, and proposes methods to improve reliability.

Contribution

It introduces a platform using large language models for data extraction and compares AI to human responses, revealing insights into AI accuracy and interpretability in knowledge synthesis.

Findings

01

AI is highly consistent with humans on explicit questions

02

Interpretive differences account for most AI-human discordance

03

AI inaccuracies are rare compared to human errors

Abstract

Knowledge syntheses (literature reviews) are essential to health professions education (HPE), consolidating findings to advance theory and practice. However, they are labor-intensive, especially during data extraction. Artificial Intelligence (AI)-assisted extraction promises efficiency but raises concerns about accuracy, making it critical to distinguish AI 'hallucinations' (fabricated content) from legitimate interpretive differences. We developed an extraction platform using large language models (LLMs) to automate data extraction and compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review. AI-human, human-human, and AI-AI consistencies were measured using interrater reliability (categorical) and thematic similarity ratings (open-ended). Errors were identified by comparing extracted responses to source publications. AI was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.