Regression-based imputation of explanatory discrete missing data
Gilma Hern\'andez-Herrera, Albert Navarro, David Mori\~na

TL;DR
This paper evaluates various regression models for imputing missing count data, highlighting the superior performance of the COMPoisson distribution across different dispersion scenarios through extensive simulations and real data analysis.
Contribution
It provides a comprehensive comparison of classical and generalized count models for imputation, emphasizing the effectiveness of the COMPoisson distribution in diverse dispersion contexts.
Findings
COMPoisson generally outperforms other models in various dispersion scenarios.
Poisson model is often used but may be inappropriate for over- or under-dispersed data.
Analyzing dispersion and zero-inflation is crucial for selecting the right imputation method.
Abstract
Imputation of missing values is a strategy for handling non-responses in surveys or data loss in measurement processes, which may be more effective than ignoring them. When the variable represents a count, the literature dealing with this issue is scarce. Likewise, if problems of over- or under-dispersion are observed, generalisations of the Poisson distribution are recommended for carrying out imputation. In order to assess the performance of various regression models in the imputation of a discrete variable compared to classical counting models, this work presents a comprehensive simulation study considering a variety of scenarios and real data. To do so we compared the results of estimations using only complete data, and using imputations based on the Poisson, negative binomial, Hermite, and COMPoisson distributions, and the ZIP and ZINB models for excesses of zeros. The results of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Survey Methodology and Nonresponse · Traffic Prediction and Management Techniques
