On the statistical analysis of grouped data: when Pearson $\chi^2$ and other divisible statistics are not goodness-of-fit tests
Sara Algeri, Estate V. Khmaladze

TL;DR
This paper provides a unified framework for analyzing grouped data, revealing limitations of traditional goodness-of-fit tests like Pearson's chi-squared, especially in sparse regimes, and introduces new distribution-free testing methods.
Contribution
It introduces a unifying approach to divisible statistics for grouped data, encompassing classical tests and proposing new distribution-free goodness-of-fit tests.
Findings
In sparse regimes, all existing tests are dominated by weighted linear statistics.
The paper offers a new perspective on the limitations of Pearson's chi-squared and similar tests.
New distribution-free goodness-of-fit tests are proposed.
Abstract
Thousands of experiments are analyzed and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what are the new possibilities in their hands. Motivated by this need, the article introduces a unifying approach to the analysis of grouped data, which allows us to study the class of divisible statistics -- that includes Pearson's , the likelihood ratio as special cases -- with a fresh perspective. The contributions collected in this manuscript span from modeling and estimation to distribution-free goodness-of-fit tests. Perhaps the most surprising result presented here is that, in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
