A Framework to Adjust Dependency Measure Estimates for Chance

Simone Romano; Nguyen Xuan Vinh; James Bailey; Karin Verspoor

arXiv:1510.07786·stat.ML·January 21, 2016·SDM

A Framework to Adjust Dependency Measure Estimates for Chance

Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor

PDF

TL;DR

This paper introduces a universal framework to adjust dependency measure estimates for finite samples, enhancing interpretability and ranking accuracy in data analysis tasks such as MIC and random forests.

Contribution

It proposes a simple, general adjustment method for dependency measures that improves their interpretability and ranking accuracy across various applications.

Findings

01

Improves MIC interpretability as a noise proxy

02

Enhances variable ranking accuracy in random forests

03

Applicable to any dependency measure

Abstract

Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability