The Imitation Game for Educational AI

Shashank Sonkar; Naiming Liu; Xinghe Chen; Richard G. Baraniuk

arXiv:2502.15127·cs.AI·February 24, 2025

The Imitation Game for Educational AI

Shashank Sonkar, Naiming Liu, Xinghe Chen, Richard G. Baraniuk

PDF

TL;DR

This paper introduces a novel two-phase Turing-like test to evaluate whether educational AI systems genuinely understand student reasoning by analyzing their ability to generate human-like distractors conditioned on individual misconceptions.

Contribution

It proposes a new evaluation framework that conditions AI responses on individual student mistakes, providing a more accurate measure of AI's understanding of student cognition compared to traditional methods.

Findings

01

Validated the importance of conditioning on individual responses

02

Established statistical requirements for high-confidence validation

03

Demonstrated AI's potential to model student thinking effectively

Abstract

As artificial intelligence systems become increasingly prevalent in education, a fundamental challenge emerges: how can we verify if an AI truly understands how students think and reason? Traditional evaluation methods like measuring learning gains require lengthy studies confounded by numerous variables. We present a novel evaluation framework based on a two-phase Turing-like test. In Phase 1, students provide open-ended responses to questions, revealing natural misconceptions. In Phase 2, both AI and human experts, conditioned on each student's specific mistakes, generate distractors for new related questions. By analyzing whether students select AI-generated distractors at rates similar to human expert-generated ones, we can validate if the AI models student cognition. We prove this evaluation must be conditioned on individual responses - unconditioned approaches merely target common…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.