Regularized Bayesian calibration and scoring of the WD-FAB IRT model improves predictive performance over marginal maximum likelihood
Joshua C. Chang, Julia Porcino, Elizabeth K. Rasch, Larry, Tang

TL;DR
This paper demonstrates that regularized Bayesian calibration of the graded response model (GRM) in item response theory improves predictive accuracy over traditional marginal maximum likelihood methods, especially in applications like the WD-FAB.
Contribution
It introduces a regularized Bayesian calibration approach for the GRM and shows its superior predictive performance in real-world test response data.
Findings
Regularized Bayesian calibration outperforms marginal maximum likelihood.
Use of compactly supported priors enhances test scoring.
Improved predictive power in WD-FAB responses.
Abstract
Item response theory (IRT) is the statistical paradigm underlying a dominant family of generative probabilistic models for test responses, used to quantify traits in individuals relative to target populations. The graded response model (GRM) is a particular IRT model that is used for ordered polytomous test responses. Both the development and the application of the GRM and other IRT models require statistical decisions. For formulating these models (calibration), one needs to decide on methodologies for item selection, inference, and regularization. For applying these models (test scoring), one needs to make similar decisions, often prioritizing computational tractability and/or interpretability. In many applications, such as in the Work Disability Functional Assessment Battery (WD-FAB), tractability implies approximating an individual's score distribution using estimates of mean and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
