Large Language Models for JSON Schema Discovery

Michael J. Mior

arXiv:2407.03286·cs.DB·July 4, 2024

Large Language Models for JSON Schema Discovery

Michael J. Mior

PDF

Open Access

TL;DR

This paper introduces a method using large language models to enhance automatically discovered JSON schemas by adding semantic information, improving their interpretability and usefulness.

Contribution

It presents a novel approach that leverages large language models and a corpus of schemas to generate semantic descriptions and meaningful names for JSON schema elements.

Findings

01

Effective generation of natural language descriptions for schema elements

02

Improved schema interpretability and usefulness

03

Good performance on text generation metrics correlating with human judgment

Abstract

Semi-structured data formats such as JSON have proved to be useful data models for applications that require flexibility in the format of data stored. However, JSON data often come without the schemas that are typically available with relational data. This has resulted in a number of tools for discovering schemas from a collection of data. Although such tools can be useful, existing approaches focus on the syntax of documents and ignore semantic information. In this work, we explore the automatic addition of meaningful semantic information to discovered schemas similar to information that is added by human schema authors. We leverage large language models and a corpus of manually authored JSON Schema documents to generate natural language descriptions of schema elements, meaningful names for reusable definitions, and identify which discovered properties are most useful and which can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling