# A Framework for Assessing Achievability of Data-Quality Constraints

**Authors:** Rada Chirkova, Jon Doyle, and Juan L. Reutter

arXiv: 1703.09141 · 2017-03-28

## TL;DR

This paper introduces a new framework for evaluating whether a set of data-processing tools can be combined to achieve desired data quality, considering tools as black boxes with certain known properties.

## Contribution

The paper develops a novel framework for assessing the achievability of data quality constraints by modeling data-processing tools as black-box procedures with limited exposed properties.

## Key findings

- Framework encapsulates data cleaning and migration tasks.
- Analyzes properties of procedures under relational constraints.
- Identifies special cases with feasible reasoning.

## Abstract

Assessing and improving the quality of data are fundamental challenges for data-intensive systems that have given rise to applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data migration are now reasonably well understood in isolation, not much attention has been given to the interplay between the tools addressing issues in these areas. We focus on the problem of determining whether the available data-processing procedures can be used together to bring about the desired quality of the given data. For instance, consider an organization introducing new data-analysis tasks. Depending on the tasks, it may be a priority to determine whether the data can be processed and transformed using the available data-processing tools to satisfy certain properties or quality assurances needed for the success of the task. Here, while the organization may control some of its tools, some other tools may be external or proprietary, with only basic information available on how they process data. The problem is then, how to decide which tools to apply, and in which order, to make the data ready for the new tasks?   Toward addressing this problem, we develop a new framework that abstracts data-processing tools as black-box procedures with only some of the properties exposed, such as the applicability requirements, the parts of the data that the procedure modifies, and the conditions that the data satisfy once the procedure has been applied. We show how common tasks such as data cleaning and data migration are encapsulated into our framework and, as a proof of concept, we study basic properties of the framework for the case of procedures described by standard relational constraints. While reasoning in this framework may be computationally infeasible in general, we show that there exist well-behaved special cases with potential practical applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.09141/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1703.09141/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1703.09141/full.md

---
Source: https://tomesphere.com/paper/1703.09141