Framework to Automatically Determine the Quality of Open Data Catalogs
Jorge Martinez-Gil

TL;DR
This paper introduces a comprehensive framework for automatically assessing the quality of open data catalogs, enabling organizations to ensure data trustworthiness and improve data asset management.
Contribution
The paper presents a novel, automated framework that evaluates multiple quality dimensions of open data catalogs, including core and non-core aspects, with implementation details and source code.
Findings
Framework effectively assesses accuracy, completeness, and consistency.
Supports compatibility and similarity analysis across catalogs.
Provides tools for evaluating provenance, readability, and licensing.
Abstract
Data catalogs play a crucial role in modern data-driven organizations by facilitating the discovery, understanding, and utilization of diverse data assets. However, ensuring their quality and reliability is complex, especially in open and large-scale data environments. This paper proposes a framework to automatically determine the quality of open data catalogs, addressing the need for efficient and reliable quality assessment mechanisms. Our framework can analyze various core quality dimensions, such as accuracy, completeness, consistency, scalability, and timeliness, offer several alternatives for the assessment of compatibility and similarity across such catalogs as well as the implementation of a set of non-core quality dimensions such as provenance, readability, and licensing. The goal is to empower data-driven organizations to make informed decisions based on trustworthy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Research Data Management Practices
