TL;DR
This paper investigates the capability of large language models to generate diverse, high-quality educational questions across Bloom's taxonomy levels, highlighting the importance of prompting techniques and evaluation methods.
Contribution
It systematically evaluates five state-of-the-art LLMs for educational question generation at various cognitive levels using advanced prompts and expert assessments.
Findings
LLMs can generate relevant, high-quality questions at different Bloom's levels
Performance varies significantly among different LLMs
Automated evaluation methods do not match human judgment accuracy
Abstract
Developing questions that are pedagogically sound, relevant, and promote learning is a challenging and time-consuming task for educators. Modern-day large language models (LLMs) generate high-quality content across multiple domains, potentially helping educators to develop high-quality questions. Automated educational question generation (AEQG) is important in scaling online education catering to a diverse student population. Past attempts at AEQG have shown limited abilities to generate questions at higher cognitive levels. In this study, we examine the ability of five state-of-the-art LLMs of different sizes to generate diverse and high-quality questions of different cognitive levels, as defined by Bloom's taxonomy. We use advanced prompting techniques with varying complexity for AEQG. We conducted expert and LLM-based evaluations to assess the linguistic and pedagogical relevance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
