Beyond Correctness: Evaluating and Improving LLM Feedback in Statistical Education

Niklas Ippisch; Markus Herklotz; Anna-Carolina Haensch; Carsten Schwemmer

arXiv:2511.07628·stat.OT·November 12, 2025

Beyond Correctness: Evaluating and Improving LLM Feedback in Statistical Education

Niklas Ippisch, Markus Herklotz, Anna-Carolina Haensch, Carsten Schwemmer

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language models in providing educational feedback in statistics courses, comparing different prompting and fine-tuning techniques to enhance feedback quality and pedagogical value.

Contribution

It demonstrates that carefully designed prompts can improve LLM feedback usefulness, and compares methods like zero-shot prompting and fine-tuning in an educational context.

Findings

01

Zero-shot prompting balances quality and cost effectively.

02

Fine-tuning offers no clear advantage over prompting.

03

LLMs reliably assess correctness but struggle with pedagogical feedback.

Abstract

Large language models (LLMs) have been proposed as scalable tools to address the gap between the importance of individualized written feedback and the practical challenges of providing it at scale. However, concerns persist regarding the accuracy, depth, and pedagogical value of their feedback responses. The present study investigates the extent to which LLMs can generate feedback that aligns with educational theory and compares techniques to improve their performance. Using mock in-class exam data from two consecutive years of an introductory statistics course at LMU Munich, we evaluated GPT-generated feedback against an established but expanded pedagogical framework. Four enhancement methods were compared in a highly standardized setting, making meaningful comparisons possible: Using a state-of-the-art model, zero-shot prompting, few-shot prompting, and supervised fine-tuning using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistics Education and Methodologies · Computational and Text Analysis Methods · Online Learning and Analytics