Pearson's goodness-of-fit tests for sparse distributions
Shuhua Chang, Deli Li, Yongcheng Qi

TL;DR
This paper investigates the behavior of Pearson's goodness-of-fit test in sparse, high-dimensional settings, showing convergence to normality and proposing a more powerful alternative, with real data application.
Contribution
It extends the theoretical understanding of Pearson's chi-squared test to sparse distributions and introduces a new, more powerful test statistic.
Findings
Chi-squared statistic converges to normal under general conditions
Proposed new test statistic outperforms chi-squared in simulations
Application demonstrated on lottery data
Abstract
Pearson's chi-squared test is widely used to test the goodness of fit between categorical data and a given discrete distribution function. When the number of sets of the categorical data, say , is a fixed integer, Pearson's chi-squared test statistic converges in distribution to a chi-squared distribution with degrees of freedom when the sample size goes to infinity. In real applications, the number often changes with and may be even much larger than . By using the martingale techniques, we prove that Pearson's chi-squared test statistic converges to the normal under quite general conditions. We also propose a new test statistic which is more powerful than chi-squared test statistic based on our simulation study. A real application to lottery data is provided to illustrate our methodology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Sensory Analysis and Statistical Methods
