Knowledge Graph for Microdata of Statistics Netherlands
Chang Sun

TL;DR
This paper presents a knowledge graph that harmonizes and links CBS microdata metadata, enabling efficient querying and exploration of datasets for researchers, thereby enhancing data accessibility and usability.
Contribution
The project creates a comprehensive, multilingual knowledge graph of CBS microdata metadata using text mining and semantic web technologies, improving data discovery and integration.
Findings
Knowledge graph enables easy metadata querying.
Researchers can explore dataset relations efficiently.
Data discovery time and costs are significantly reduced.
Abstract
Statistics Netherlands (CBS) hosted a huge amount of data not only on the statistical level but also on the individual level. With the development of data science technologies, more and more researchers request to conduct their research by using high-quality individual data from CBS (called CBS Microdata) or combining them with other data sources. Making great use of these data for research and scientific purposes can tremendously benefit the whole society. However, CBS Microdata has been collected and maintained in different ways by different departments in and out of CBS. The representation, quality, metadata of datasets are not sufficiently harmonized. The project converts the descriptions of all CBS microdata sets into one knowledge graph with comprehensive metadata in Dutch and English using text mining and semantic web technologies. Researchers can easily query the metadata,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Geographic Information Systems Studies · Big Data Technologies and Applications
