EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
Amir Hadifar, Semere Kiros Bitew, Johannes Deleu, Chris Develder,, Thomas Demeester

TL;DR
EduQG is a high-quality, expert-annotated dataset of 3,397 educational multiple choice questions, designed to facilitate research in question and distractor generation, format conversion, and cognitive complexity analysis.
Contribution
The paper introduces EduQG, a novel dataset with expert-generated questions, source annotations, and cognitive levels, addressing limitations of existing datasets for educational question generation.
Findings
Distinct differences from existing question datasets
Effective baseline models demonstrated on EduQG
Potential for advancing educational question generation research
Abstract
We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Assessment and Pedagogy · Topic Modeling · Educational Technology and Assessment
