TL;DR
This paper explores the use of topic modeling techniques to automatically cluster introductory computer science exercises, transforming code solutions into text and validating the semantic coherence of the resulting clusters.
Contribution
It introduces a novel method combining code structure analysis with topic modeling to identify meaningful question clusters in computer science education.
Findings
Six semantically coherent clusters identified
Achieved 0.75 NPMI score indicating strong semantic coherence
Results correlate with human expert ratings
Abstract
Manually determining concepts present in a group of questions is a challenging and time-consuming process. However, the process is an essential step while modeling a virtual learning environment since a mapping between concepts and questions using mastery level assessment and recommendation engines are required. We investigated unsupervised semantic models (known as topic modeling techniques) to assist computer science teachers in this task and propose a method to transform Computer Science 1 teacher-provided code solutions into representative text documents, including the code structure information. By applying non-negative matrix factorization and latent Dirichlet allocation techniques, we extract the underlying relationship between questions and validate the results using an external dataset. We consider the interpretability of the learned concepts using 14 university professors'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
