Evaluating LLMs in the Context of a Functional Programming Course: A Comprehensive Study

Yihan Zhang (McGill University; Canada); Brigitte Pientka (McGill University; Canada); Xujie Si (University of Toronto; USA)

arXiv:2603.05646·cs.PL·March 9, 2026

Evaluating LLMs in the Context of a Functional Programming Course: A Comprehensive Study

Yihan Zhang (McGill University, Canada), Brigitte Pientka (McGill University, Canada), Xujie Si (University of Toronto, USA)

PDF

Open Access

TL;DR

This study evaluates the effectiveness of large language models in a low-resource functional programming course using OCaml, introducing new benchmarks and manual grading to assess their capabilities and limitations.

Contribution

It presents three comprehensive benchmarks for evaluating LLMs in OCaml educational tasks and provides insights into their strengths and weaknesses in a low-resource language setting.

Findings

01

LLMs are effective in syntax correction and conceptual questions

02

They solve fewer homework problems in low-resource settings compared to Python and Java

03

Top LLMs perform well across all tasks in the course

Abstract

Large-Language Models (LLMs) are changing the way learners acquire knowledge outside the classroom setting. Previous studies have shown that LLMs seem effective in generating to short and simple questions in introductory CS courses using high-resource programming languages such as Java or Python. In this paper, we evaluate the effectiveness of LLMs in the context of a low-resource programming language -- OCaml, in an educational setting. In particular, we built three benchmarks to comprehensively evaluate 9 state-of-the-art LLMs: 1) $λ$ CodeGen (a benchmark containing natural-language homework programming problems); 2) $λ$ Repair (a benchmark containing programs with syntax, type, and logical errors drawn from actual student submissions); 3) $λ$ Explain (a benchmark containing natural language questions regarding theoretical programming concepts). We grade each LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming · Software Engineering Research · Intelligent Tutoring Systems and Adaptive Learning