The Imitation Game for Educational AI
Shashank Sonkar, Naiming Liu, Xinghe Chen, Richard G. Baraniuk

TL;DR
This paper introduces a novel two-phase Turing-like test to evaluate whether educational AI systems genuinely understand student reasoning by analyzing their ability to generate human-like distractors conditioned on individual misconceptions.
Contribution
It proposes a new evaluation framework that conditions AI responses on individual student mistakes, providing a more accurate measure of AI's understanding of student cognition compared to traditional methods.
Findings
Validated the importance of conditioning on individual responses
Established statistical requirements for high-confidence validation
Demonstrated AI's potential to model student thinking effectively
Abstract
As artificial intelligence systems become increasingly prevalent in education, a fundamental challenge emerges: how can we verify if an AI truly understands how students think and reason? Traditional evaluation methods like measuring learning gains require lengthy studies confounded by numerous variables. We present a novel evaluation framework based on a two-phase Turing-like test. In Phase 1, students provide open-ended responses to questions, revealing natural misconceptions. In Phase 2, both AI and human experts, conditioned on each student's specific mistakes, generate distractors for new related questions. By analyzing whether students select AI-generated distractors at rates similar to human expert-generated ones, we can validate if the AI models student cognition. We prove this evaluation must be conditioned on individual responses - unconditioned approaches merely target common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
