Finding Words Associated with DIF: Predicting Differential Item Functioning using LLMs and Explainable AI

Hotaka Maeda; Yikai Lu

arXiv:2502.07017·cs.CL·November 4, 2025

Finding Words Associated with DIF: Predicting Differential Item Functioning using LLMs and Explainable AI

Hotaka Maeda, Yikai Lu

PDF

Open Access

TL;DR

This study uses large language models and explainable AI to predict and interpret words linked to differential item functioning in assessments, aiding fairer test design.

Contribution

It introduces a novel approach combining LLMs and XAI to identify words associated with DIF, enhancing understanding and fairness in test items.

Findings

01

Many DIF-associated words reflect test blueprint sub-domains.

02

The approach can screen words during item writing for fairness.

03

It helps review and interpret traditional DIF analysis results.

Abstract

We fine-tuned and compared several encoder-based Transformer large language models (LLM) to predict differential item functioning (DIF) from the item text. We then applied explainable artificial intelligence (XAI) methods to these models to identify specific words associated with DIF. The data included 42,180 items designed for English language arts and mathematics summative state assessments among students in grades 3 to 11. Prediction $R^{2}$ ranged from .04 to .32 among eight focal and reference group pairs. Our findings suggest that many words associated with DIF reflect minor sub-domains included in the test blueprint by design, rather than construct-irrelevant item content that should be removed from assessments. This may explain why qualitative reviews of DIF items often yield confusing or inconclusive results. Our approach can be used to screen words associated with DIF during the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding