# Using item response theory as a methodology to impute categorical missing values

**Authors:** Adrienne Kline, Yuan Luo

PMC · DOI: 10.1038/s41598-025-20032-7 · Scientific Reports · 2025-11-05

## TL;DR

This paper proposes using Item Response Theory (IRT) to impute missing categorical data and shows it performs well compared to existing methods.

## Contribution

The novelty is applying IRT for categorical data imputation and demonstrating its effectiveness across various data types and missingness patterns.

## Key findings

- IRT-based imputation outperformed kNN, MICE, and DataWig in many conditions.
- The method showed strong performance across ordinal, nominal, and binary datasets.
- IRT provides probabilistic category assignments for missing values.

## Abstract

Most datasets suffer from partial or complete missing values, which has downstream limitations on the available models on which to test the data and on any statistical inferences that can be made from the data. Several imputation techniques have been designed to replace missing data with stand in values. The various approaches have implications for calculating clinical scores, model building and model testing. The work showcased here supports using an Item Response Theory (IRT) based approach for categorical imputation, comparing it against several methodologies currently used in the machine learning field including k-nearest neighbors (kNN), multiple imputed chained equations (MICE) and Amazon Web Services (AWS) deep learning method, DataWig. Analyses comparing these techniques were performed on three different datasets that represented ordinal, nominal and binary categories. The data were modified so that they also varied on both the proportion of data missing and the systematization of the missing data. Two different assessments of performance were conducted: accuracy in reproducing the missing values, and predictive performance using the imputed data. Results demonstrated that the proposed method, Item Response Theory for categorical imputation, fared quite well compared to currently used multiple imputation methods, outperforming several of them in many conditions. Given the theoretical basis for the approach, and the unique generation of probabilistic terms for determining category belonging for missing cells, IRT for categorical imputation offers a viable alternative to current approaches.

## Full-text entities

- **Diseases:** NRM (MESH:D000849), GRM (MESH:D004195), Heart Disease (MESH:D006331), blood (MESH:D006402), Hypo- and hypernatremia (MESH:D006955), IRT (MESH:D005547)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12589431/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12589431/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12589431/full.md

---
Source: https://tomesphere.com/paper/PMC12589431