A Survey Study on the State of the Art of Programming Exercise Generation using Large Language Models
Eduard Frankford, Ingo H\"ohn, Clemens Sauerwein, Ruth Breu

TL;DR
This survey evaluates the capabilities of large language models in generating programming exercises, highlighting their strengths, weaknesses, and proposing an evaluation matrix to guide future research and educational applications.
Contribution
It provides a comprehensive survey of LLMs for programming exercise generation, introduces an evaluation matrix, and discusses challenges and potential solutions.
Findings
Multiple LLMs can generate useful programming exercises
Challenges include LLMs solving exercises too easily
Proposed an evaluation matrix for assessing LLMs in this context
Abstract
This paper analyzes Large Language Models (LLMs) with regard to their programming exercise generation capabilities. Through a survey study, we defined the state of the art, extracted their strengths and weaknesses and finally proposed an evaluation matrix, helping researchers and educators to decide which LLM is the best fitting for the programming exercise generation use case. We also found that multiple LLMs are capable of producing useful programming exercises. Nevertheless, there exist challenges like the ease with which LLMs might solve exercises generated by LLMs. This paper contributes to the ongoing discourse on the integration of LLMs in education.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Approaches in Healthcare and Education Studies · Educational Systems and Policies · Education and Learning Interventions
