A Framework to Adjust Dependency Measure Estimates for Chance
Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor

TL;DR
This paper introduces a universal framework to adjust dependency measure estimates for finite samples, enhancing interpretability and ranking accuracy in data analysis tasks such as MIC and random forests.
Contribution
It proposes a simple, general adjustment method for dependency measures that improves their interpretability and ranking accuracy across various applications.
Findings
Improves MIC interpretability as a noise proxy
Enhances variable ranking accuracy in random forests
Applicable to any dependency measure
Abstract
Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
