Proper Correlation Coefficients for Nominal Random Variables
Jan-Lukas Wermuth

TL;DR
This paper introduces new dependence measures for nominal variables that are always attainable, applicable with continuous variables, and provide reliable independence testing, improving upon classical measures.
Contribution
It proposes novel dependence measures satisfying attainability and applicability with continuous variables, along with a consistent estimator and practical testing procedures.
Findings
New dependence measures are always attainable and applicable with continuous variables.
The paper provides a consistent estimator and asymptotic distribution for the new measures.
Applications demonstrate the usefulness of the measures in real-world data analysis.
Abstract
This paper develops an intuitive concept of perfect dependence between two variables of which at least one has a nominal scale. Perfect dependence is attainable for all marginal distributions. It furthermore proposes a set of dependence measures that are 1 if and only if this perfect dependence is satisfied. The advantages of these dependence measures relative to classical dependence measures like contingency coefficients, Goodman-Kruskal's lambda and tau and the so-called uncertainty coefficient are twofold. Firstly, they are defined if one of the variables exhibits continuities. Secondly, they satisfy the property of attainability. That is, they can take all values in the interval [0,1] irrespective of the marginals involved. Both properties are not shared by classical dependence measures which need two discrete marginal distributions and can in some situations yield values close to 0…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbability and Risk Models
