FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback
Seongyeub Chu, Jongwoo Kim, Munyong Yi

TL;DR
FeedEval is a framework that evaluates and filters LLM-generated essay feedback based on pedagogical dimensions, improving the quality of feedback used for scoring and revision tasks.
Contribution
It introduces a novel LLM-based evaluation method for assessing feedback quality along specific pedagogical dimensions, enhancing automated essay scoring and revision effectiveness.
Findings
FeedEval aligns closely with human judgments.
Filtered feedback improves essay scoring performance.
High-quality feedback leads to better essay revisions.
Abstract
Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Sentiment Analysis and Opinion Mining
