Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011
Noah Simon, Robert Tibshirani

TL;DR
This paper critically evaluates the Maximal Information Coefficient (MIC) proposed by Reshef et al., revealing that MIC generally has lower statistical power compared to distance correlation and Pearson correlation in detecting dependencies, especially in noisy data.
Contribution
The authors provide a comparative analysis showing MIC's limitations in power against other correlation measures through simulation studies.
Findings
MIC has lower power than distance correlation in most cases.
MIC is less effective than Pearson correlation for linear relationships.
MIC performs poorly with noisy data and certain non-linear relationships.
Abstract
The proposal of Reshef et al. (2011) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations. To investigate this, we ran simulations to compare the power of MIC to that of standard Pearson correlation and distance correlation (dcor). We simulated pairs of variables with different relationships (most of which were considered by the Reshef et. al.), but with varying levels of noise added. To determine proper cutoffs for testing the independence hypothesis, we simulated independent data with the appropriate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance
