Differential Test Functioning via Robust Scaling
Peter F. Halpin

TL;DR
This paper introduces a new approach to differential test functioning (DTF) focusing on how the distribution of the latent trait varies across groups, using robust estimation and testing methods to improve impact assessment.
Contribution
It proposes a robust impact estimator that remains consistent when fewer than half of the items exhibit DIF, and develops a Wald test for comparing impact estimates.
Findings
Robust impact estimator performs well with fewer than 50% DIF items.
The difference between naive and robust impact estimates quantifies DIF effects.
Simulation and empirical results demonstrate the effectiveness of the proposed methods.
Abstract
In the item response theory (IRT) literature, differential test functioning (DTF) has been conceptualized in terms of how the test response function differs over groups of respondents. This paper presents an alternative approach to DTF that focusses on how the distribution of the latent trait differs over groups, which is referred to as impact. It is proposed to evaluate DTF by comparing two estimates of impact, one that naively aggregates over all test items and a robust alternative that down-weights items that exhibit differential item functioning (DIF). Taking this approach, this paper makes the following three contributions. First it is shown that the difference between the naive and robust estimands provides a convenient effect size for quantifying the extent to which DIF affects conclusions about impact (as opposed to test scores). Second it is shown how to construct a robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
