Automatic Generation of Inference Making Questions for Reading Comprehension Assessments
Wanjing Anya Ma, Michael Flor, Zuowei Wang

TL;DR
This paper presents a method using GPT-4o to automatically generate reading comprehension inference questions, analyzing their quality and inference type accuracy, aiming to improve diagnostic assessments.
Contribution
It introduces a taxonomy of inference types and demonstrates GPT-4o's capability to generate high-quality RC questions with a focus on inference skills.
Findings
93.8% questions rated as good quality
42.6% questions matched targeted inference types
High inter-rater agreement above 0.90
Abstract
Inference making is an essential but complex skill in reading comprehension (RC). Some inferences require resolving references across sentences, and some rely on using prior knowledge to fill in the detail that is not explicitly written in the text. Diagnostic RC questions can help educators provide more effective and targeted reading instruction and interventions for school-age students. We introduce a taxonomy of inference types for RC and use it to analyze the distribution of items within a diagnostic RC item bank. Next, we present experiments using GPT-4o to generate bridging-inference RC items for given reading passages via few-shot prompting, comparing conditions with and without chain-of-thought prompts. Generated items were evaluated on three aspects: overall item quality, appropriate inference type, and LLM reasoning, achieving high inter-rater agreements above 0.90. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReading and Literacy Development · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning
