Inference for Error-Prone Count Data: Estimation under a Binomial Convolution Framework
Yuqiu Yang, Christina Vu, Cornelis J. Potgieter, Xinlei Wang, Akihito Kamata

TL;DR
This paper introduces a binomial convolution framework for analyzing error-prone count data, providing methods to estimate true scores and accuracy rates, with applications in oral reading fluency assessments involving both human and AI scores.
Contribution
It extends binary misclassification models to bounded count data, proposing three estimation strategies and demonstrating their effectiveness through simulations and real data applications.
Findings
Maximum likelihood estimation is most accurate but computationally intensive.
Regression is simple and stable but less precise.
GMM offers a balance but is sensitive to outliers.
Abstract
Measurement error in count data is common but underexplored in the literature, particularly in contexts where observed scores are bounded and arise from discrete scoring processes. Motivated by applications in oral reading fluency assessment, we propose a binomial convolution framework that extends binary misclassification models to settings where only the aggregate number of correct responses is observed, and errors may involve both overcounting and undercounting the number of events. The model accommodates distinct true positive and true negative accuracy rates and preserves the bounded nature of the data. Assuming the availability of both contaminated and error-free scores on a subset of items, we develop and compare three estimation strategies: maximum likelihood estimation (MLE), linear regression, and generalized method of moments (GMM). Extensive simulations show that MLE is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Modeling and Causal Inference · Statistical Methods and Bayesian Inference
