Toward Semi-Automatic Misconception Discovery Using Code Embeddings

Yang Shi; Krupal Shah; Wengran Wang; Samiha Marwan; Poorvaja Penmetsa; and Thomas W. Price

arXiv:2103.04448·cs.LG·March 9, 2021

Toward Semi-Automatic Misconception Discovery Using Code Embeddings

Yang Shi, Krupal Shah, Wengran Wang, Samiha Marwan, Poorvaja Penmetsa, and Thomas W. Price

PDF

TL;DR

This paper introduces a semi-automated approach for discovering student misconceptions in programming courses by leveraging code embeddings and clustering, enabling more efficient analysis of student errors.

Contribution

It presents a novel method using code embeddings from a classification model to cluster and identify specific misconceptions in student programming submissions.

Findings

01

Clusters correspond to distinct misconceptions

02

Method uncovers misconceptions not easily found by existing approaches

03

Potential for informing targeted teaching strategies

Abstract

Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses, using a state-of-the-art code classification model. We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions. We found these clusters correspond to specific misconceptions about the problem and would not have been easily discovered with existing approaches. We also discuss potential applications of our approach and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.