Representing Dataset Quality Metadata using Multi-Dimensional Views
Jeremy Debattista, Christoph Lange, S\"oren Auer

TL;DR
This paper introduces the Dataset Quality Ontology (daQ), a framework for representing and visualizing data quality metadata as multi-dimensional statistical observations to aid data consumers and publishers.
Contribution
The paper presents daQ, a core vocabulary for encoding dataset quality metrics as multi-dimensional data cube observations, enabling better analysis and visualization of data quality.
Findings
daQ supports embedding quality metadata into linked datasets
It facilitates analysis of data versions and quality browsing
Data cube visualization tools enhance quality assessment
Abstract
Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
