An Algorithm for the Discovery of Independence from Data

Miika Hannula; Bor-Kuan Song; Sebastian Link

arXiv:2101.02502·cs.DB·January 8, 2021·1 cites

An Algorithm for the Discovery of Independence from Data

Miika Hannula, Bor-Kuan Song, Sebastian Link

PDF

Open Access

TL;DR

This paper introduces the first algorithm to discover independence statements in data tables, addressing their computational complexity and enabling approximate independence detection with practical performance insights.

Contribution

It presents the first algorithm for discovering all independence statements in data and explores approximate independence, advancing understanding of their computational properties.

Findings

01

Algorithm performs well within NP-complete limits

02

Extended to approximate independence with trade-offs

03

Provides insights into computational complexity of independence discovery

Abstract

For years, independence has been considered as an important concept in many disciplines. Nevertheless, we present the first research that investigates the discovery problem of independence in data. In its arguably simplest form, independence is a statement between two sets of columns expressing that for every two rows in a table there is also a row in the table that coincides with the first row on the first set of columns and with the second row on the second set of columns. We show that the problem of deciding whether there is an independence statement that holds on a given table is not only NP-complete but $W [3]$ -complete in its arguably most natural parameter, namely its arity. We establish the first algorithm to discover all independence statement that hold on a given table. We illustrate in experiments with benchmark data that our algorithm performs well within the limits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Data Management and Algorithms · Data Mining Algorithms and Applications