An Algorithm for the Discovery of Independence from Data
Miika Hannula, Bor-Kuan Song, Sebastian Link

TL;DR
This paper introduces the first algorithm to discover independence statements in data tables, addressing their computational complexity and enabling approximate independence detection with practical performance insights.
Contribution
It presents the first algorithm for discovering all independence statements in data and explores approximate independence, advancing understanding of their computational properties.
Findings
Algorithm performs well within NP-complete limits
Extended to approximate independence with trade-offs
Provides insights into computational complexity of independence discovery
Abstract
For years, independence has been considered as an important concept in many disciplines. Nevertheless, we present the first research that investigates the discovery problem of independence in data. In its arguably simplest form, independence is a statement between two sets of columns expressing that for every two rows in a table there is also a row in the table that coincides with the first row on the first set of columns and with the second row on the second set of columns. We show that the problem of deciding whether there is an independence statement that holds on a given table is not only NP-complete but -complete in its arguably most natural parameter, namely its arity. We establish the first algorithm to discover all independence statement that hold on a given table. We illustrate in experiments with benchmark data that our algorithm performs well within the limits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Management and Algorithms · Data Mining Algorithms and Applications
