Prompting Strategies for Language Model-Based Item Generation in K-12 Education: Bridging the Gap Between Small and Large Language Models

Mohammad Amini; Babak Ahmadi; Xiaomeng Xiong; Yilin Zhang; Christopher Qiao

arXiv:2508.20217·cs.CL·August 29, 2025

Prompting Strategies for Language Model-Based Item Generation in K-12 Education: Bridging the Gap Between Small and Large Language Models

Mohammad Amini, Babak Ahmadi, Xiaomeng Xiong, Yilin Zhang, Christopher Qiao

PDF

Open Access

TL;DR

This paper investigates prompting strategies and fine-tuning to improve small and medium language models for automatic creation of K-12 assessment items, demonstrating effective methods to enhance output quality and alignment with educational goals.

Contribution

It introduces structured prompting techniques and fine-tuning approaches that significantly enhance the quality of item generation by medium-sized language models in educational contexts.

Findings

01

Structured prompting improves model output quality.

02

Fine-tuning enhances model alignment with assessment goals.

03

Mid-sized models can effectively generate educational items with proper strategies.

Abstract

This study explores automatic generation (AIG) using language models to create multiple choice questions (MCQs) for morphological assessment, aiming to reduce the cost and inconsistency of manual test development. The study used a two-fold approach. First, we compared a fine-tuned medium model (Gemma, 2B) with a larger untuned one (GPT-3.5, 175B). Second, we evaluated seven structured prompting strategies, including zero-shot, few-shot, chain-of-thought, role-based, sequential, and combinations. Generated items were assessed using automated metrics and expert scoring across five dimensions. We also used GPT-4.1, trained on expert-rated samples, to simulate human scoring at scale. Results show that structured prompting, especially strategies combining chain-of-thought and sequential design, significantly improved Gemma's outputs. Gemma generally produced more construct-aligned and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques