On the properties of distance covariance for categorical data: Robustness, sure screening, and approximate null distributions
Qingyang Zhang

TL;DR
This paper investigates the properties of distance covariance for categorical data, demonstrating its robustness, effectiveness in screening, and deriving an approximate null distribution, thus offering improvements over traditional Chi-squared tests.
Contribution
The study provides theoretical and empirical insights into the robustness and screening capabilities of distance covariance for categorical variables, including null distribution approximation.
Findings
Distance covariance is B-robust for any number of categories.
Distance covariance screening outperforms Chi-squared in large sparse tables.
An effective approximate null distribution for bias-corrected distance correlation is derived.
Abstract
Pearson's Chi-squared test, though widely used for detecting association between categorical variables, exhibits low statistical power in large sparse contingency tables. To address this limitation, two novel permutation tests have been recently developed: the distance covariance permutation test and the U-statistic permutation test. Both leverage the distance covariance functional but employ different estimators. In this work, we explore key statistical properties of the distance covariance for categorical variables. Firstly, we show that unlike Chi-squared, the distance covariance functional is B-robust for any number of categories (fixed or diverging). Second, we establish the strong consistency of distance covariance screening under mild conditions, and simulations confirm its advantage over Chi-squared screening, especially for large sparse tables. Finally, we derive an approximate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
