Testing the significance of assuming homogeneity in contingency-tables/cross-tabulations
Mark Tygert

TL;DR
This paper explores the effectiveness of the Euclidean distance in testing the homogeneity assumption in contingency tables, showing it can be more powerful than traditional methods like chi-square.
Contribution
It demonstrates that Euclidean/Frobenius/Hilbert-Schmidt distance is more effective for testing homogeneity than classical statistics.
Findings
Euclidean distance outperforms chi-square in power
Frobenius distance offers better sensitivity for homogeneity tests
Results suggest new approaches for contingency table analysis
Abstract
The model for homogeneity of proportions in a two-way contingency-table/cross-tabulation is the same as the model of independence, except that the probabilistic process generating the data is viewed as fixing the column totals (but not the row totals). When gauging the consistency of observed data with the assumption of independence, recent work has illustrated that the Euclidean/Frobenius/Hilbert-Schmidt distance is often far more statistically powerful than the classical statistics such as chi-square, the log-likelihood-ratio (G), the Freeman-Tukey/Hellinger distance, and other members of the Cressie-Read power-divergence family. The present paper indicates that the Euclidean/Frobenius/Hilbert-Schmidt distance can be more powerful for gauging the consistency of observed data with the assumption of homogeneity, too.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · History and advancements in chemistry · Data Analysis and Archiving
