Learning Code-Edit Embedding to Model Student Debugging Behavior
Hasnain Heickal, Andrew Lan

TL;DR
This paper introduces a model that learns code-edit embeddings from student submissions to better understand debugging behavior, improve personalized feedback, and suggest next steps in programming education.
Contribution
It presents a novel encoder-decoder model fine-tuned with test case information to capture code editing patterns and support personalized feedback in student programming.
Findings
Model achieves high accuracy in code reconstruction.
Enables effective personalized code suggestions.
Reveals common debugging behaviors through clustering.
Abstract
Providing effective feedback for programming assignments in computer science education can be challenging: students solve problems by iteratively submitting code, executing it, and using limited feedback from the compiler or the auto-grader to debug. Analyzing student debugging behavior in this process may reveal important insights into their knowledge and inform better personalized support tools. In this work, we propose an encoder-decoder-based model that learns meaningful code-edit embeddings between consecutive student code submissions, to capture their debugging behavior. Our model leverages information on whether a student code submission passes each test case to fine-tune large language models (LLMs) to learn code editing representations. It enables personalized next-step code suggestions that maintain the student's coding style while improving test case correctness. Our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Software Engineering Research · Software Testing and Debugging Techniques
