Exploring Language Patterns in a Medical Licensure Exam Item Bank
Swati Padhee, Kimberly Swygert, Ian Micir

TL;DR
This paper presents a novel NLP-based machine learning approach to detect biased or stereotypical language in a large medical licensure exam item bank, aiming to enhance test validity and fairness.
Contribution
It introduces the first application of ML and NLP techniques to identify biased language in medical exam items, facilitating large-scale review and updating of test content.
Findings
The approach can effectively identify potential biased language in exam items.
Machine learning models can cluster and predict biased language patterns.
Method supports maintaining content validity and fairness in licensure assessments.
Abstract
This study examines the use of natural language processing (NLP) models to evaluate whether language patterns used by item writers in a medical licensure exam might contain evidence of biased or stereotypical language. This type of bias in item language choices can be particularly impactful for items in a medical licensure assessment, as it could pose a threat to content validity and defensibility of test score validity evidence. To the best of our knowledge, this is the first attempt using machine learning (ML) and NLP to explore language bias on a large item bank. Using a prediction algorithm trained on clusters of similar item stems, we demonstrate that our approach can be used to review large item banks for potential biased language or stereotypical patient characteristics in clinical science vignettes. The findings may guide the development of methods to address stereotypical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
