# Discovering Reliable Approximate Functional Dependencies

**Authors:** Panagiotis Mandros, Mario Boley, Jilles Vreeken

arXiv: 1705.09391 · 2017-06-20

## TL;DR

This paper introduces a new information-theoretic method for reliably discovering approximate functional dependencies in data, addressing bias, efficiency, and optimality in dependency mining.

## Contribution

It presents a bias-corrected score and an optimistic estimator enabling efficient, reliable discovery of approximate dependencies with guarantees of optimality.

## Key findings

- Score effectively balances bias and variance.
- Algorithm efficiently discovers meaningful dependencies.
- Method remains reliable with sparse data.

## Abstract

Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or $\alpha$-approximate top-$k$ dependencies? These are exactly the questions we answer in this paper.   As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.09391/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1705.09391/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1705.09391/full.md

---
Source: https://tomesphere.com/paper/1705.09391