Inference for Error-Prone Count Data: Estimation under a Binomial Convolution Framework

Yuqiu Yang; Christina Vu; Cornelis J. Potgieter; Xinlei Wang; Akihito Kamata

arXiv:2506.20596·stat.ME·June 26, 2025

Inference for Error-Prone Count Data: Estimation under a Binomial Convolution Framework

Yuqiu Yang, Christina Vu, Cornelis J. Potgieter, Xinlei Wang, Akihito Kamata

PDF

Open Access

TL;DR

This paper introduces a binomial convolution framework for analyzing error-prone count data, providing methods to estimate true scores and accuracy rates, with applications in oral reading fluency assessments involving both human and AI scores.

Contribution

It extends binary misclassification models to bounded count data, proposing three estimation strategies and demonstrating their effectiveness through simulations and real data applications.

Findings

01

Maximum likelihood estimation is most accurate but computationally intensive.

02

Regression is simple and stable but less precise.

03

GMM offers a balance but is sensitive to outliers.

Abstract

Measurement error in count data is common but underexplored in the literature, particularly in contexts where observed scores are bounded and arise from discrete scoring processes. Motivated by applications in oral reading fluency assessment, we propose a binomial convolution framework that extends binary misclassification models to settings where only the aggregate number of correct responses is observed, and errors may involve both overcounting and undercounting the number of events. The model accommodates distinct true positive and true negative accuracy rates and preserves the bounded nature of the data. Assuming the availability of both contaminated and error-free scores on a subset of items, we develop and compare three estimation strategies: maximum likelihood estimation (MLE), linear regression, and generalized method of moments (GMM). Extensive simulations show that MLE is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Modeling and Causal Inference · Statistical Methods and Bayesian Inference