Toward Semi-Automatic Misconception Discovery Using Code Embeddings
Yang Shi, Krupal Shah, Wengran Wang, Samiha Marwan, Poorvaja Penmetsa, and Thomas W. Price

TL;DR
This paper introduces a semi-automated approach for discovering student misconceptions in programming courses by leveraging code embeddings and clustering, enabling more efficient analysis of student errors.
Contribution
It presents a novel method using code embeddings from a classification model to cluster and identify specific misconceptions in student programming submissions.
Findings
Clusters correspond to distinct misconceptions
Method uncovers misconceptions not easily found by existing approaches
Potential for informing targeted teaching strategies
Abstract
Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses, using a state-of-the-art code classification model. We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions. We found these clusters correspond to specific misconceptions about the problem and would not have been easily discovered with existing approaches. We also discuss potential applications of our approach and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
