Detecting Quality Problems in Research Data: A Model-Driven Approach
Arno Kesper, Viola Wenz, Gabriele Taentzer

TL;DR
This paper introduces a model-driven, technology-agnostic approach for detecting quality issues in research data, with a focus on cultural heritage datasets, using analysis patterns and tool support for XML databases.
Contribution
It presents a novel, abstract analysis pattern framework that can be adapted to various database technologies for quality problem detection in research data.
Findings
Effective in identifying quality problems in cultural heritage data
Achieved good performance and expressiveness in XML database analysis
Validated through a qualitative study with domain experts
Abstract
As scientific progress highly depends on the quality of research data, there are strict requirements for data quality coming from the scientific community. A major challenge in data quality assurance is to localise quality problems that are inherent to data. Due to the dynamic digitalisation in specific scientific fields, especially the humanities, different database technologies and data formats may be used in rather short terms to gain experiences. We present a model-driven approach to analyse the quality of research data. It allows abstracting from the underlying database technology. Based on the observation that many quality problems show anti-patterns, a data engineer formulates analysis patterns that are generic concerning the database format and technology. A domain expert chooses a pattern that has been adapted to a specific database technology and concretises it for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
