TL;DR
This paper evaluates Vuong's statistical tests for comparing nested and non-nested item response theory models, demonstrating their reliability and advantages over traditional methods through simulations and real data analysis.
Contribution
It applies Vuong's model selection tests to IRT models, extending their use to both nested and non-nested cases with empirical validation.
Findings
Vuong tests reliably distinguish between different IRT models.
In non-nested cases, Vuong tests outperform traditional methods.
In nested cases, Vuong tests perform comparably or better than likelihood ratio tests.
Abstract
In this paper, we apply Vuong's (1989) general approach of model selection to the comparison of nested and non-nested unidimensional and multidimensional item response theory (IRT) models. Vuong's approach of model selection is useful because it allows for formal statistical tests of both nested and non-nested models. However, only the test of non-nested models has been applied in the context of IRT models to date. After summarizing the statistical theory underlying the tests, we investigate the performance of all three distinct Vuong tests in the context of IRT models using simulation studies and real data. In the non-nested case we observed that the tests can reliably distinguish between the graded response model and the generalized partial credit model. In the nested case, we observed that the tests typically perform as well as or sometimes better than the traditional likelihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
