# Schema validation and evaluation framework for extracted schemas in JSON databases

**Authors:** Saad Belefqih, Mohammed Barchane, Ahmed Zellou, El Habib Benlahmar

PMC · DOI: 10.1038/s41598-026-45554-6 · Scientific Reports · 2026-03-27

## TL;DR

This paper introduces a framework to evaluate the quality of extracted schemas in schemaless JSON databases, enabling better comparisons and analysis of schema extraction methods.

## Contribution

The novel contribution is the Schema Validation and Evaluation Framework (SVEF), which provides standardized criteria for assessing schema extraction across six dimensions.

## Key findings

- SVEF evaluates schemas using six dimensions including data type accuracy and temporal evolution detection.
- Existing schema extraction methods perform well in basic type reconstruction but struggle with complex structures and schema evolution.
- SVEF offers a consistent basis for comparing schema extraction strategies in dynamic data environments.

## Abstract

The increasing use of schemaless data systems has intensified the need for reliable methods to assess the quality of extracted schemas intended for downstream tasks such as data integration, query optimisation, and interoperability. Although numerous schema inference techniques have been proposed, the field still lacks standardised and method-independent criteria for evaluating the validity and accuracy of inferred schemas. This paper introduces the Schema Validation and Evaluation Framework (SVEF), a systematic evaluation model for assessing extracted schemas across six complementary dimensions that capture essential structural and semantic properties: Data Type Accuracy, Required and Optional Fields, Multiple Type Support, Collection Structure Consistency, Entity Relationships, and Temporal Evolution Detection. Each dimension is defined through formal, data-driven metrics that quantify the degree to which an inferred schema reflects characteristics observed in the underlying dataset. In the present study, the framework is instantiated and evaluated for schemaless document-oriented data represented in JSON or JSON-like form. SVEF is evaluated using controlled benchmark datasets with curated ground-truth schemas and is applied to three representative schema extraction approaches. The results show that, while existing methods achieve strong performance in basic type reconstruction, substantial differences remain in modelling conditional fields, complex collection structures, and schema evolution over time. SVEF provides a consistent and interpretable basis for comparing schema extraction strategies and supports more rigorous empirical analysis of their behaviour in dynamic document-oriented data environments.

## Full-text entities

- **Diseases:** SVEF (MESH:D000072861)
- **Chemicals:** BERT (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13039129/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13039129/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC13039129/full.md

---
Source: https://tomesphere.com/paper/PMC13039129