Validating Streaming JSON Documents with Learned VPAs
V\'eronique Bruy\`ere, Guillermo A. Perez, Ga\"etan Staquet

TL;DR
This paper introduces a novel streaming validation algorithm for JSON documents that leverages learned visibly pushdown automata (VPA) to efficiently verify schema compliance in real-time.
Contribution
It demonstrates that JSON schema validation can be performed using learned VPAs, providing an efficient streaming validation method that is validated through empirical evaluation.
Findings
The learned VPA accurately models JSON schema constraints.
The streaming algorithm outperforms classical validation methods in speed.
Validation accuracy is maintained across diverse JSON documents.
Abstract
We present a new streaming algorithm to validate JSON documents against a set of constraints given as a JSON schema. Among the possible values a JSON document can hold, objects are unordered collections of key-value pairs while arrays are ordered collections of values. We prove that there always exists a visibly pushdown automaton (VPA) that accepts the same set of JSON documents as a JSON schema. Leveraging this result, our approach relies on learning a VPA for the provided schema. As the learned VPA assumes a fixed order on the key-value pairs of the objects, we abstract its transitions in a special kind of graph, and propose an efficient streaming algorithm using the VPA and its graph to decide whether a JSON document is valid for the schema. We evaluate the implementation of our algorithm on a number of random JSON documents, and compare it to the classical validation algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Advanced Database Systems and Queries
