Evaluating LLMs in the Context of a Functional Programming Course: A Comprehensive Study
Yihan Zhang (McGill University, Canada), Brigitte Pientka (McGill University, Canada), Xujie Si (University of Toronto, USA)

TL;DR
This study evaluates the effectiveness of large language models in a low-resource functional programming course using OCaml, introducing new benchmarks and manual grading to assess their capabilities and limitations.
Contribution
It presents three comprehensive benchmarks for evaluating LLMs in OCaml educational tasks and provides insights into their strengths and weaknesses in a low-resource language setting.
Findings
LLMs are effective in syntax correction and conceptual questions
They solve fewer homework problems in low-resource settings compared to Python and Java
Top LLMs perform well across all tasks in the course
Abstract
Large-Language Models (LLMs) are changing the way learners acquire knowledge outside the classroom setting. Previous studies have shown that LLMs seem effective in generating to short and simple questions in introductory CS courses using high-resource programming languages such as Java or Python. In this paper, we evaluate the effectiveness of LLMs in the context of a low-resource programming language -- OCaml, in an educational setting. In particular, we built three benchmarks to comprehensively evaluate 9 state-of-the-art LLMs: 1) CodeGen (a benchmark containing natural-language homework programming problems); 2) Repair (a benchmark containing programs with syntax, type, and logical errors drawn from actual student submissions); 3) Explain (a benchmark containing natural language questions regarding theoretical programming concepts). We grade each LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Software Engineering Research · Intelligent Tutoring Systems and Adaptive Learning
