New Exam Security Questions in the AI Era: Comparing AI-Generated Item Similarity Between Naive and Detail-Guided Prompting Approaches

Ting Wang; Caroline Prendergast; Susan Lottridge

arXiv:2512.23729·cs.CY·January 1, 2026

New Exam Security Questions in the AI Era: Comparing AI-Generated Item Similarity Between Naive and Detail-Guided Prompting Approaches

Ting Wang, Caroline Prendergast, Susan Lottridge

PDF

Open Access

TL;DR

This study compares AI-generated exam questions created with naive prompts versus those using proprietary guidance, revealing that publicly prompted LLMs can produce items similar to guided ones, raising security concerns.

Contribution

It demonstrates that LLMs with only public prompts can generate exam items closely resembling proprietary-guided items, highlighting security risks in AI-assisted exam development.

Findings

01

High internal consistency within each prompting strategy

02

Lower cross-strategy similarity overall

03

Some domain pairs exceeded similarity threshold, indicating convergence

Abstract

Large language models (LLMs) have emerged as powerful tools for generating domain-specific multiple-choice questions (MCQs), offering efficiency gains for certification boards but raising new concerns about examination security. This study investigated whether LLM-generated items created with proprietary guidance differ meaningfully from those generated using only publicly available resources. Four representative clinical activities from the American Board of Family Medicine (ABFM) blueprint were mapped to corresponding Entrustable Professional Activities (EPAs), and three LLMs (GPT-4o, Claude 4 Sonnet, Gemini 2.5 Flash) produced items under a naive strategy using only public EPA descriptors, while GPT-4o additionally produced items under a guided strategy that incorporated proprietary blueprints, item-writing guidelines, and exemplar items, yielding 160 total items. Question stems and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Psychometric Methodologies and Testing · Innovations in Medical Education