Comparing Two Categorical Gini Correlations with Applications to Classification Problems
Sameera Hewage, Yongli Sang

TL;DR
This paper introduces a statistical framework for comparing predictor importance in classification tasks using categorical Gini correlation, with applications demonstrated on real datasets.
Contribution
It develops a novel inferential method for comparing CGCs across predictors, accommodating arbitrary predictor dimensions and dependence structures.
Findings
The test statistic is asymptotically normal under null and alternative hypotheses.
The bootstrap procedure provides a reliable inference alternative.
Applications show the method's effectiveness in real classification problems.
Abstract
This article proposes an inferential framework for comparing predictor importance in classification problems with categorical response variables. The approach is based on the categorical Gini correlation (CGC) proposed by Dang et al. (2020), a measure of dependence between numerical predictors and categorical outcomes. Predictor importance is evaluated by testing differences in CGCs across competing predictor groups. The proposed methodology accommodates predictors of arbitrary and unequal dimensions and allows for dependence between predictor groups. Asymptotic normality of the test statistic is established under both the null and alternative hypotheses, and the resulting test is shown to be consistent. In addition to deriving the asymptotic distribution, a nonparametric bootstrap procedure is developed as an alternative approach to inference. Simulation studies, along with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
