Your Students Don't Use LLMs Like You Wish They Did
Sebastian Kobler, Matthew Clemson, Angela Sun, Jonathan K. Kummerfeld

TL;DR
This paper introduces six automated metrics to evaluate how well educational dialogue systems align with pedagogical goals, revealing significant usage pattern misalignments in student-AI interactions.
Contribution
The authors propose novel computational metrics for assessing pedagogical alignment in student-AI dialogue and validate them with extensive real-world data.
Findings
Students mainly use AI tutors for answer extraction rather than sustained learning.
Deployment context strongly influences student usage patterns.
Whole-dialogue evaluation overlooks turn-by-turn interaction nuances.
Abstract
Educational NLP systems are typically evaluated using engagement metrics and satisfaction surveys, which are at best a proxy for meeting pedagogical goals. We introduce six computational metrics for automated evaluation of pedagogical alignment in student-AI dialogue. We validate our metrics through analysis of 12,650 messages across 500 conversations from four courses. Using our metrics, we identify a fundamental misalignment: educators design conversational tutors for sustained learning dialogue, but students mainly use them for answer-extraction. Deployment context is the strongest predictor of usage patterns, outweighing student preference or system design: when AI tools are optional, usage concentrates around deadlines; when integrated into course structure, students ask for solutions to verbatim assignment questions. Whole-dialogue evaluation misses these turn-by-turn patterns.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
