On-Demand JSON: A Better Way to Parse Documents?
John Keiser, Daniel Lemire

TL;DR
This paper introduces On-Demand JSON parsing, a lazy evaluation approach that improves performance over traditional DOM-based parsing by only materializing data as needed, and demonstrates its effectiveness across multiple systems.
Contribution
The paper presents a novel lazy JSON parsing interface called On-Demand, which enhances performance and is adopted by several major systems.
Findings
Superior performance on commodity processors
Open source implementation available for reproducibility
Adopted by systems like Apache Doris and Node.js
Abstract
JSON is a popular standard for data interchange on the Internet. Ingesting JSON documents can be a performance bottleneck. A popular parsing strategy consists in converting the input text into a tree-based data structure -- sometimes called a Document Object Model or DOM. We designed and implemented a novel JSON parsing interface -- called On-Demand -- that appears to the programmer like a conventional DOM-based approach. However, the underlying implementation is a pointer iterating through the content, only materializing the results (objects, arrays, strings, numbers) lazily.On recent commodity processors, an implementation of our approach provides superior performance in multiple benchmarks. To ensure reproducibility, our work is freely available as open source software. Several systems use On-Demand: e.g., Apache Doris, the Node.js JavaScript runtime, Milvus, and Velox.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Service-Oriented Architecture and Web Services · Web Data Mining and Analysis
