Can we trust LLMs as a tutor for our students? Evaluating the Quality of LLM-generated Feedback in Statistics Exams
Markus Herklotz, Niklas Ippisch, Anna-Carolina Haensch

TL;DR
This study evaluates the accuracy and pedagogical quality of GPT-4 generated feedback in a university statistics exam setting, revealing both promising potential and notable limitations for LLMs as scalable educational tools.
Contribution
It provides an empirical analysis of LLM-generated feedback quality in a real classroom context, highlighting error rates and feedback characteristics.
Findings
Approximately 7% of feedback instances contained errors.
Feedback mainly explained correctness or incorrectness, with less focus on deeper insights.
LLMs show potential but require careful quality control in educational settings.
Abstract
One of the central challenges for instructors is offering meaningful individual feedback, especially in large courses. Faced with limited time and resources, educators are often forced to rely on generalized feedback, even when more personalized support would be pedagogically valuable. To overcome this limitation, one potential technical solution is to utilize large language models (LLMs). For an exploratory study using a new platform connected with LLMs, we conducted a LLM-corrected mock exam during the "Introduction to Statistics" lecture at the University of Munich (Germany). The online platform allows instructors to upload exercises along with the correct solutions. Students complete these exercises and receive overall feedback on their results, as well as individualized feedback generated by GPT-4 based on the correct answers provided by the lecturers. The resulting dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Educational Assessment and Pedagogy · Intelligent Tutoring Systems and Adaptive Learning
