An Approach to Detect Abnormal Submissions for CodeWorkout Dataset
Alex Hicks, Yang Shi, Arun-Balajiee Lekshmi-Narayanan, Wei Yan, Samiha, Marwan

TL;DR
This paper proposes a preliminary approach to identify abnormal student log data in programming learning environments, aiming to improve personalized recommendation systems by detecting anomalies beyond traditional plagiarism methods.
Contribution
It introduces a novel method to detect various types of abnormal submissions in student log data, addressing limitations of existing plagiarism detection tools.
Findings
Initial analysis of abnormal log data patterns
Potential to enhance recommendation accuracy by filtering anomalies
Framework for future development of anomaly detection methods
Abstract
Students interactions while solving problems in learning environments (i.e. log data) are often used to support students learning. For example, researchers use log data to develop systems that can provide students with personalized problem recommendations based on their knowledge level. However, anomalies in the students log data, such as cheating to solve programming problems, could introduce a hidden bias in the log data. As a result, these systems may provide inaccurate problem recommendations, and therefore, defeat their purpose. Classical cheating detection methods, such as MOSS, can be used to detect code plagiarism. However, these methods cannot detect other abnormal events such as a student gaming a system with multiple attempts of similar solutions to a particular programming problem. This paper presents a preliminary study to analyze log data with anomalies. The goal of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
