JEDI: These aren't the JSON documents you're looking for... (Extended Version*)
Thomas H\"utter, Nikolaus Augsten, Christoph M. Kirsch, Michael J., Carey, Chen Li

TL;DR
This paper introduces JEDI, a novel edit-distance measure for JSON documents, along with efficient algorithms and indexing techniques that enable scalable similarity searches in large JSON databases.
Contribution
It proposes JSON tree representation, the JEDI distance measure, and the QuickJEDI algorithm, significantly improving JSON similarity query performance and scalability.
Findings
QuickJEDI outperforms baseline by an order of magnitude in runtime.
The JSIM index and upper bound technique greatly enhance query efficiency.
The solution scales to millions of documents with large JSON trees.
Abstract
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold , retrieve all JSON documents that are within from the query document. Due to its recursive definition, JSON data are naturally represented as trees. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document. This feature poses a new challenge to the tree model and distance computation. We propose JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON documents. We develop an algorithm, called QuickJEDI, for computing JEDI by leveraging a new technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Algorithms and Data Compression
