New Exam Security Questions in the AI Era: Comparing AI-Generated Item Similarity Between Naive and Detail-Guided Prompting Approaches
Ting Wang, Caroline Prendergast, Susan Lottridge

TL;DR
This study compares AI-generated exam questions created with naive prompts versus those using proprietary guidance, revealing that publicly prompted LLMs can produce items similar to guided ones, raising security concerns.
Contribution
It demonstrates that LLMs with only public prompts can generate exam items closely resembling proprietary-guided items, highlighting security risks in AI-assisted exam development.
Findings
High internal consistency within each prompting strategy
Lower cross-strategy similarity overall
Some domain pairs exceeded similarity threshold, indicating convergence
Abstract
Large language models (LLMs) have emerged as powerful tools for generating domain-specific multiple-choice questions (MCQs), offering efficiency gains for certification boards but raising new concerns about examination security. This study investigated whether LLM-generated items created with proprietary guidance differ meaningfully from those generated using only publicly available resources. Four representative clinical activities from the American Board of Family Medicine (ABFM) blueprint were mapped to corresponding Entrustable Professional Activities (EPAs), and three LLMs (GPT-4o, Claude 4 Sonnet, Gemini 2.5 Flash) produced items under a naive strategy using only public EPA descriptors, while GPT-4o additionally produced items under a guided strategy that incorporated proprietary blueprints, item-writing guidelines, and exemplar items, yielding 160 total items. Question stems and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Psychometric Methodologies and Testing · Innovations in Medical Education
