Fast Discovery of Nested Dependencies on JSON Data
Michael J. Mior

TL;DR
This paper introduces new methods for efficiently discovering meaningful nested data dependencies in JSON data, extending traditional dependency mining techniques to better handle complex, real-world datasets.
Contribution
It proposes two algorithms for mining nested dependencies in JSON data, including an optimized approach that significantly reduces runtime and handles incomplete data.
Findings
The adapted algorithms can process JSON data more efficiently than traditional methods.
The second strategy reduces runtime by multiple orders of magnitude on real datasets.
The methods effectively handle incomplete or invalid data in dependency discovery.
Abstract
Functional and inclusion dependencies are the most widely used classes of data dependencies in data profiling due to their ability to identify relationships in data such as primary and foreign keys. These relationships are equally important when dealing with nested data formats such as JSON. However, the definition of functional and inclusion dependencies makes use of a flat, unnested relational model which misses many useful types of dependencies on data which involve nested data models. In this work, we identify types of dependencies which are not captured by traditional functional and inclusion dependencies but which nevertheless capture meaningful relationships among nested data. We also demonstrate how algorithms for mining these traditional dependencies can be adapted to also mine nested dependencies. The first strategy simply flattens the input data and feeds into unmodified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
