Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools

Lorenzo Lee Solano; Charles Koutcheme; Juho Leinonen; Alexandra Vassar; Jake Renzella

arXiv:2507.05305·cs.CY·July 9, 2025

Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools

Lorenzo Lee Solano, Charles Koutcheme, Juho Leinonen, Alexandra Vassar, Jake Renzella

PDF

Open Access

TL;DR

This paper demonstrates that smaller, fine-tuned open-source language models can effectively serve as pedagogical tools, offering a cost-effective and accessible alternative to large proprietary models for educational purposes.

Contribution

The study introduces a new dataset of compiler error explanations and shows that supervised fine-tuning significantly improves the educational quality of smaller open-source LLMs, making them comparable to larger models.

Findings

01

Fine-tuned smaller models achieve performance comparable to larger models.

02

Supervised fine-tuning on high-quality data enhances pedagogical effectiveness.

03

A replicable methodology is provided for developing educational LLMs.

Abstract

Frontier Large language models (LLMs) like ChatGPT and Gemini can decipher cryptic compiler errors for novice programmers, but their computational scale, cost, and tendency to over-assist make them problematic for widespread pedagogical adoption. This work demonstrates that smaller, specialised language models, enhanced via Supervised Fine-Tuning (SFT), present a more viable alternative for educational tools. We utilise a new dataset of 40,000 C compiler error explanations, derived from real introductory programming (CS1/2) student-generated programming errors, which we used to fine-tune three open-source models: Qwen3-4B, Llama-3.1-8B, and Qwen3-32B. We performed a dual evaluation, combining expert human reviews with a large-scale automated analysis of 8,000 responses using a validated LLM-as-judge ensemble. Our results show that SFT significantly boosts the pedagogical quality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Teaching and Learning Programming · Machine Learning in Materials Science