Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

Charles Koutcheme; Juho Leinonen; Arto Hellas

arXiv:2604.10720·cs.AI·May 14, 2026

Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

Charles Koutcheme, Juho Leinonen, Arto Hellas

PDF

TL;DR

This paper introduces a framework for training open-weight models to simulate student programming behavior by serializing their problem-solving process into conversational data, improving alignment with real student debugging actions.

Contribution

It presents a novel training pipeline combining supervised fine-tuning and preference optimization on authentic student data, enabling models to better mimic student debugging behavior.

Findings

01

Models trained with environment feedback outperform prior approaches.

02

Fine-tuned models show improved functional alignment and code similarity.

03

The framework effectively simulates student debugging in programming assignments.

Abstract

Artificial students -- models that simulate how learners act and respond within educational systems -- are a promising tool for evaluating tutoring strategies and feedback mechanisms at scale. However, most existing approaches rely on prompting large, proprietary language models, limiting adaptability to specific courses and raising concerns around privacy, cost, and dependence. In this work, we propose a framework for training open-weight artificial programming learners directly from authentic student process data. Our approach serializes temporal log traces into a conversational format, representing each student's problem-solving process as a dialogue between the learner and their automated assessment system. Student code submissions and environment feedback, such as test outcomes, grades, and error traces, form alternating conversational turns, enabling models to learn from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.