Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education
Smitha Kumar, Michael A. Lones, Manuel Maarek, Hind Zantout

TL;DR
This study evaluates the effectiveness of various LLMs in identifying common errors in machine learning code to support education, revealing their strengths, limitations, and potential for guiding learners.
Contribution
It provides a comparative analysis of closed and open LLMs in detecting ML coding pitfalls, highlighting current limitations and opportunities for deploying smaller models in education.
Findings
Basic pitfalls are easily identified by all models.
Many complex pitfalls, especially in early ML pipeline stages, are not well detected.
LLMs can provide useful feedback and guidance when they identify pitfalls.
Abstract
The rapid advancement of Large Language Models (LLMs) has opened new avenues in education. This study examines the use of LLMs in supporting learning in machine learning education; in particular, it focuses on the ability of LLMs to identify common errors of practice (pitfalls) in machine learning code, and their ability to provide feedback that can guide learning. Using a portfolio of code samples, we consider four different LLMs: one closed model and three open models. Whilst the most basic pitfalls are readily identified by all models, many common pitfalls are not. They particularly struggle to identify pitfalls in the early stages of the ML pipeline, especially those which can lead to information leaks, a major source of failure within applied ML projects. They also exhibit limited success at identifying pitfalls around model selection, which is a concept that students often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
