Beyond Correctness: Evaluating and Improving LLM Feedback in Statistical Education
Niklas Ippisch, Markus Herklotz, Anna-Carolina Haensch, Carsten Schwemmer

TL;DR
This study evaluates the effectiveness of large language models in providing educational feedback in statistics courses, comparing different prompting and fine-tuning techniques to enhance feedback quality and pedagogical value.
Contribution
It demonstrates that carefully designed prompts can improve LLM feedback usefulness, and compares methods like zero-shot prompting and fine-tuning in an educational context.
Findings
Zero-shot prompting balances quality and cost effectively.
Fine-tuning offers no clear advantage over prompting.
LLMs reliably assess correctness but struggle with pedagogical feedback.
Abstract
Large language models (LLMs) have been proposed as scalable tools to address the gap between the importance of individualized written feedback and the practical challenges of providing it at scale. However, concerns persist regarding the accuracy, depth, and pedagogical value of their feedback responses. The present study investigates the extent to which LLMs can generate feedback that aligns with educational theory and compares techniques to improve their performance. Using mock in-class exam data from two consecutive years of an introductory statistics course at LMU Munich, we evaluated GPT-generated feedback against an established but expanded pedagogical framework. Four enhancement methods were compared in a highly standardized setting, making meaningful comparisons possible: Using a state-of-the-art model, zero-shot prompting, few-shot prompting, and supervised fine-tuning using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Computational and Text Analysis Methods · Online Learning and Analytics
