Representing Dataset Quality Metadata using Multi-Dimensional Views

Jeremy Debattista; Christoph Lange; S\"oren Auer

arXiv:1408.2468·cs.DB·August 12, 2014·2 cites

Representing Dataset Quality Metadata using Multi-Dimensional Views

Jeremy Debattista, Christoph Lange, S\"oren Auer

PDF

Open Access

TL;DR

This paper introduces the Dataset Quality Ontology (daQ), a framework for representing and visualizing data quality metadata as multi-dimensional statistical observations to aid data consumers and publishers.

Contribution

The paper presents daQ, a core vocabulary for encoding dataset quality metrics as multi-dimensional data cube observations, enabling better analysis and visualization of data quality.

Findings

01

daQ supports embedding quality metadata into linked datasets

02

It facilitates analysis of data versions and quality browsing

03

Data cube visualization tools enhance quality assessment

Abstract

Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies