ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle
Mihran Miroyan, Rose Niousha, Joseph E. Gonzalez, Gireeja Ranade, Narges Norouzi

TL;DR
ParaStudent explores how large language models can generate student-like programming code that reflects real students' iterative, imperfect, and stylistic coding behaviors, advancing educational AI applications.
Contribution
The paper introduces ParaStudent, a systematic approach to generate and evaluate realistic student code using LLMs, incorporating learning dynamics and multi-dimensional assessment.
Findings
Fine-tuning improves alignment with student trajectories.
Model captures error patterns and stylistic variations.
Context-aware and temporal modeling enhances realism.
Abstract
Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics
