Mining Approximate Acyclic Schemes from Relations
Batya Kenig, Pranay Mundra, Guna Prasad, Babak Salimi, Dan Suciu

TL;DR
This paper introduces Maimon, a system that discovers approximate acyclic schemes and multivalued dependencies from data, using information theory to handle noise and data imperfections, with scalable performance on large datasets.
Contribution
The paper presents a novel, principled approach to discovering approximate acyclic schemes and MVDs from data, addressing noise sensitivity and scalability issues.
Findings
Maimon successfully discovers approximate MVDs in real-world datasets.
The system scales to datasets with up to 1 million rows and 30 columns.
Experimental results demonstrate the effectiveness of the approach.
Abstract
Acyclic schemes have numerous applications in databases and in machine learning, such as improved design, more efficient storage, and increased performance for queries and machine learning algorithms. Multivalued dependencies (MVDs) are the building blocks of acyclic schemes. The discovery from data of both MVDs and acyclic schemes is more challenging than other forms of data dependencies, such as Functional Dependencies, because these dependencies do not hold on subsets of data, and because they are very sensitive to noise in the data; for example a single wrong or missing tuple may invalidate the schema. In this paper we present Maimon, a system for discovering approximate acyclic schemes and MVDs from data. We give a principled definition of approximation, by using notions from information theory, then describe the two components of Maimon: mining for approximate MVDs, then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
