Rapidash: Efficient Constraint Discovery via Rapid Verification
Zifan Liu, Shaleen Deep, Anna Fariha, Fotis Psallidas, Ashish Tiwari,, Avrilia Floratou

TL;DR
Rapidash introduces a fast, scalable framework for constraint verification and discovery in large datasets, significantly reducing computation time and enabling incremental insights into data integrity constraints.
Contribution
The paper presents a novel near-linear time DC verification algorithm and an anytime discovery method, improving efficiency and usability over existing quadratic-time approaches.
Findings
Verification algorithm is up to 40 times faster than previous methods.
The discovery process provides incremental constraints without lengthy data structure building.
Framework effectively handles large-scale production datasets.
Abstract
Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there has been considerable research interest in achieving fast verification and discovery of exact DCs within the database community. Despite the significant advancements in the field, prior work exhibits notable limitations when confronted with large-scale datasets. The current state-of-the-art exact DC verification algorithm demonstrates a quadratic (worst-case) time complexity relative to the dataset's number of rows. In the context of DC discovery, existing methodologies rely on a two-step algorithm that commences with an expensive data structure-building phase, often requiring hours to complete even for datasets containing only a few…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Semantic Web and Ontologies
