Tests for categorical data beyond Pearson: A distance covariance and energy distance approach
Fernando Castro-Prado, Wenceslao Gonz\'alez-Manteiga, Javier Costas,, Fernando Facal, Dominic Edelmann

TL;DR
This paper introduces new statistical tests for categorical data based on distance covariance and energy distance, overcoming limitations of classical methods like Pearson's test, with proven theoretical properties and practical effectiveness.
Contribution
It develops novel testing strategies for categorical dependence and goodness-of-fit using distance-based measures, avoiding resampling and enhancing reliability.
Findings
The proposed tests outperform classical methods in simulations.
The methodology is theoretically sound with well-calibrated null distributions.
Real data examples demonstrate practical applicability in biostatistics.
Abstract
Categorical variables are of uttermost importance in biomedical research. When two of them are considered, it is often the case that one wants to test whether or not they are statistically dependent. We show weaknesses of classical methods -- such as Pearson's and the G-test -- and we propose testing strategies based on distances that lack those drawbacks. We first develop this theory for classical two-dimensional contingency tables, within the context of distance covariance, an association measure that characterises general statistical independence of two variables. We then apply the same fundamental ideas to one-dimensional tables, namely to the testing for goodness of fit to a discrete distribution, for which we resort to an analogous statistic called energy distance. We prove that our methodology has desirable theoretical properties, and we show how we can calibrate the null…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference
