Exploring the Effectiveness of Using LLMs for Automated Assessment of Student Self Explanations in Programming Education
Arun-Balajiee Lekshmi-Narayanan, Mohammad Hassany, Peter Brusilovsky

TL;DR
This paper compares the effectiveness of LLMs and semantic similarity methods for automatically scoring student self-explanations in programming education, addressing a key challenge in automated assessment.
Contribution
It provides a rigorous comparison between LLM-based scoring and semantic similarity approaches for binary classification of student explanations.
Findings
LLMs outperform semantic similarity in scoring accuracy
The study highlights the importance of dataset quality for automated scoring methods
Results suggest LLMs are more effective for assessing student explanations
Abstract
Worked examples are step-by-step solutions to problems in a specific domain, offered to students to acquire domain-specific problem-solving skills. The effectiveness of worked examples could be enhanced by combining them with self-explanations, which ask students to explain rather than passively study each problem-solving step. The main challenge of this approach is assessing the correctness of the student's explanations. In the prevailing approach, student explanations are judged by their semantic similarity to an instructor's or domain expert's explanation. Given recent advances in LLM-based automated scoring, it remains unclear whether semantic similarity methods are still the most effective technique to automatically score textual student responses like essays or code explanations. Comparing these methods also requires quality datasets that offer distinctive features such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
