Detection of flawed multiple-choice questions in preclinical medical education using item difficulty and discrimination indices: a six-year analysis

Varanya Srisomsak; Chantacha Sitticharoon; Issarawan Keadkraichaiwat; Sunan Meethes; Inpreeya Inpaen

PMC · DOI:10.1186/s12909-025-08204-5·December 1, 2025

Detection of flawed multiple-choice questions in preclinical medical education using item difficulty and discrimination indices: a six-year analysis

Varanya Srisomsak, Chantacha Sitticharoon, Issarawan Keadkraichaiwat, Sunan Meethes, Inpreeya Inpaen

PDF

Open Access

TL;DR

This study shows that using statistical thresholds alone misses some flawed multiple-choice questions in medical exams, highlighting the need for expert review alongside quantitative analysis.

Contribution

The study provides empirical evidence that static psychometric thresholds miss a significant portion of flawed exam items.

Findings

01

14.3% of flawed items were missed when relying solely on p-value and rpb-value thresholds.

02

Flawed items tended to be more difficult and less discriminative than uncorrected items.

03

Expert review is necessary alongside quantitative metrics to ensure exam quality.

Abstract

MCQ exams may include flawed items affecting validity. Psychometric indicators such as item difficulty (p-value) and point-biserial coefficient (rpb-value) are widely used to identify problematic questions. Evidence on using p-value (< 0.25) and/or rpb-value thresholds (< 0) to detect flawed items remains limited. This study aimed to provide a proof-of-concept using a large, real-world dataset, evaluating how often flawed items were missed when relying solely on static thresholds. Exam analyses from 32 preclinical courses (academic years 2017–2022) were reviewed. Items meeting predefined thresholds were flagged, while all items were manually reviewed when the most frequently chosen answer was not the keyed correct answer or when multiple options had similar p-values. Flagged items were sent to course directors for verification, and only confirmed items were recorded as corrections.…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

MCQ

Diseases1

COVID-19

Figures9

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychometric Methodologies and Testing · Medical Education and Admissions · Reliability and Agreement in Measurement