Automating the Identification of High-Value Datasets in Open Government   Data Portals

Alfonso Quarati; Anastasija Nikiforova

arXiv:2406.10541·cs.CY·June 18, 2024

Automating the Identification of High-Value Datasets in Open Government Data Portals

Alfonso Quarati, Anastasija Nikiforova

PDF

Open Access

TL;DR

This paper presents an automated, data-driven method to identify high-value datasets in open government portals by analyzing user interest metrics, aiding transparency and data utilization.

Contribution

It introduces a quantitative approach for automatically detecting high-value datasets based on usage statistics, reducing reliance on manual identification.

Findings

01

Effective identification of HVDs in US city portals

02

Insights into dataset usage trends and citizen preferences

03

Potential to improve open data management and policy

Abstract

Recognized for fostering innovation and transparency, driving economic growth, enhancing public services, supporting research, empowering citizens, and promoting environmental sustainability, High-Value Datasets (HVD) play a crucial role in the broader Open Government Data (OGD) movement. However, identifying HVD presents a resource-intensive and complex challenge due to the nuanced nature of data value. Our proposal aims to automate the identification of HVDs on OGD portals using a quantitative approach based on a detailed analysis of user interest derived from data usage statistics, thereby minimizing the need for human intervention. The proposed method involves extracting download data, analyzing metrics to identify high-value categories, and comparing HVD datasets across different portals. This automated process provides valuable insights into trends in dataset usage, reflecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Privacy-Preserving Technologies in Data