Automating the Identification of High-Value Datasets in Open Government Data Portals
Alfonso Quarati, Anastasija Nikiforova

TL;DR
This paper presents an automated, data-driven method to identify high-value datasets in open government portals by analyzing user interest metrics, aiding transparency and data utilization.
Contribution
It introduces a quantitative approach for automatically detecting high-value datasets based on usage statistics, reducing reliance on manual identification.
Findings
Effective identification of HVDs in US city portals
Insights into dataset usage trends and citizen preferences
Potential to improve open data management and policy
Abstract
Recognized for fostering innovation and transparency, driving economic growth, enhancing public services, supporting research, empowering citizens, and promoting environmental sustainability, High-Value Datasets (HVD) play a crucial role in the broader Open Government Data (OGD) movement. However, identifying HVD presents a resource-intensive and complex challenge due to the nuanced nature of data value. Our proposal aims to automate the identification of HVDs on OGD portals using a quantitative approach based on a detailed analysis of user interest derived from data usage statistics, thereby minimizing the need for human intervention. The proposed method involves extracting download data, analyzing metrics to identify high-value categories, and comparing HVD datasets across different portals. This automated process provides valuable insights into trends in dataset usage, reflecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data
