An Information-Theoretic Measure of Dependency Among Variables in Large   Datasets

Ali Mousavi; Richard G. Baraniuk

arXiv:1508.04073·cs.IT·August 18, 2015

An Information-Theoretic Measure of Dependency Among Variables in Large Datasets

Ali Mousavi, Richard G. Baraniuk

PDF

TL;DR

This paper introduces a computationally efficient approximation to the maximal information coefficient (MIC) for measuring dependence between variables, enabling scalable analysis of large datasets while maintaining detection of linear and non-linear associations.

Contribution

It proposes a new approximation method for MIC using uniform data partitioning, reducing computational cost significantly.

Findings

01

The approximation closely matches the original MIC in detecting dependencies.

02

Experiments show the method is faster and scalable for large datasets.

03

The approach maintains high accuracy in various dependency scenarios.

Abstract

The maximal information coefficient (MIC), which measures the amount of dependence between two variables, is able to detect both linear and non-linear associations. However, computational cost grows rapidly as a function of the dataset size. In this paper, we develop a computationally efficient approximation to the MIC that replaces its dynamic programming step with a much simpler technique based on the uniform partitioning of data grid. A variety of experiments demonstrate the quality of our approximation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.