An Information-Theoretic Measure of Dependency Among Variables in Large Datasets
Ali Mousavi, Richard G. Baraniuk

TL;DR
This paper introduces a computationally efficient approximation to the maximal information coefficient (MIC) for measuring dependence between variables, enabling scalable analysis of large datasets while maintaining detection of linear and non-linear associations.
Contribution
It proposes a new approximation method for MIC using uniform data partitioning, reducing computational cost significantly.
Findings
The approximation closely matches the original MIC in detecting dependencies.
Experiments show the method is faster and scalable for large datasets.
The approach maintains high accuracy in various dependency scenarios.
Abstract
The maximal information coefficient (MIC), which measures the amount of dependence between two variables, is able to detect both linear and non-linear associations. However, computational cost grows rapidly as a function of the dataset size. In this paper, we develop a computationally efficient approximation to the MIC that replaces its dynamic programming step with a much simpler technique based on the uniform partitioning of data grid. A variety of experiments demonstrate the quality of our approximation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
