Discovering Reliable Correlations in Categorical Data

Panagiotis Mandros; Mario Boley; Jilles Vreeken

arXiv:1908.11682·cs.LG·September 2, 2019

Discovering Reliable Correlations in Categorical Data

Panagiotis Mandros, Mario Boley, Jilles Vreeken

PDF

1 Repo

TL;DR

This paper introduces a new non-parametric estimator for reliable correlation measurement in categorical data and provides an efficient algorithmic framework for discovering top correlated attribute sets, validated through empirical case studies.

Contribution

It proposes a corrected, consistent estimator for normalized total correlation and an effective search framework for top-k correlated sets in categorical data.

Findings

01

Estimator achieves low regret with small samples

02

Algorithms are effective for large, high-dimensional data

03

Framework successfully identifies meaningful correlations in case studies

Abstract

In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pmandros/wodiscovery
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.