Extracting JSON Schemas with Tagged Unions
Stefan Klessinger, Meike Klettke, Uta St\"orl, Stefanie Scherzinger

TL;DR
This paper presents a method for automatically discovering tagged unions in JSON data, which are conditional schema patterns, using formal dependencies and heuristics, demonstrated on real-world GeoJSON and TopoJSON datasets.
Contribution
It formalizes the detection of tagged unions in JSON schemas via conditional functional dependencies and if-then-else operators, with a practical heuristic approach.
Findings
Successfully detects tagged unions in real-world datasets
Prototype implementation shows promising results
Extends understanding of schema inference in schema-free data stores
Abstract
With data lakes and schema-free NoSQL document stores, extracting a descriptive schema from JSON data collections is an acute challenge. In this paper, we target the discovery of tagged unions, a JSON Schema design pattern where the value of one property of an object (the tag) conditionally implies subschemas for sibling properties. We formalize these implications as conditional functional dependencies and capture them using the JSON Schema operators if-then-else. We further motivate our heuristics to avoid overfitting. Experiments with our prototype implementation are promising, and show that this form of tagged unions can successfully be detected in real-world GeoJSON and TopoJSON datasets. In discussing future work, we outline how our approach can be extended further.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Advanced Database Systems and Queries
