gcor: A Python Implementation of Categorical Gini Correlation and Its Inference
Sameera Hewage

TL;DR
This paper introduces a Python package for Categorical Gini Correlation, enabling efficient computation, confidence interval construction, and independence testing for mixed data types.
Contribution
It provides the first optimized Python implementation of CGC with algorithms for inference and feature screening, improving computational efficiency.
Findings
Efficient algorithms for CGC computation and inference implemented.
Optimizations via vectorization and parallelization enhance performance.
Demonstrates superior feature screening performance over existing methods.
Abstract
Categorical Gini Correlation (CGC), introduced by Dang et al. (2020), is a novel dependence measure designed to quantify the association between a numerical variable and a categorical variable. It has appealing properties compared to existing dependence measures, such as zero correlation mutually implying independence between the variables. It has also shown superior performance over existing methods when applied to feature screening for classification. This article presents a Python implementation for computing CGC, constructing confidence intervals, and performing independence tests based on it. Efficient algorithms have been implemented for all procedures, and they have been optimized using vectorization and parallelization to enhance computational efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
