Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study
Wenhan Lyu, Yimeng Wang, Tingting (Rachel) Chung, Yifan Sun, Yixuan, Zhang

TL;DR
This semester-long study evaluates how LLM-powered AI assistants like CodeTutor impact beginner programming students, showing improved scores but highlighting challenges in critical thinking development and user engagement.
Contribution
The paper provides empirical evidence on the effectiveness, student perceptions, and engagement dynamics of LLM-based tools in introductory computer science education.
Findings
Students using CodeTutor achieved higher final scores.
Students without prior LLM experience gained more from the tool.
User prompt quality significantly affected response effectiveness.
Abstract
The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
