A Semantic Schema for Data Quality Management in a Multi-Tenant Data Platform
Ning Zhou, Sandra Garcia Esparza, Lars Marius Garshol

TL;DR
This paper presents a semantic schema management system that enhances data quality and supports schema evolution in a multi-tenant data platform, enabling scalable data processing for a global marketplace.
Contribution
It introduces a semantic schema management approach with versioning, testing, and transformation capabilities tailored for multi-tenant data platforms.
Findings
System processes over one billion events daily
Supports schema evolution with versioning and testing
Operates successfully in a production environment
Abstract
Schibsted Media Group is a global marketplace company with presence in more than 20 countries. It is undergoing a digital transformation to convert data silos to a multi-tenant system based on a common data platform. Good data quality based on a common schema on the semantic level is essential for building successful data-driven products across marketplaces. To solve this challenge, we developed the data quality tooling based on a semantic schema management system to support schema evolution with versioning, testing and transformation. It can monitor the data quality requirements for different applications and handle incoming data consisting of multiple schema versions. Today the system is operating in production and processes over one billion events per day for over 100 applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Web Data Mining and Analysis
