Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web
Kate Lin, Tarfah Alrashed, Natasha Noy

TL;DR
This paper analyzes the complex relationships between datasets on the Web, proposing a taxonomy, developing machine learning methods for identification, and highlighting gaps in semantic markup to improve dataset discovery and understanding.
Contribution
It introduces a comprehensive taxonomy of dataset relationships, develops machine learning methods with 90% accuracy for their identification, and discusses semantic markup gaps to enhance dataset linking.
Findings
Machine learning methods achieve 90% accuracy in classifying dataset relationships.
A comprehensive taxonomy of dataset relationships is proposed.
Gaps in semantic markup hinder the identification of dataset relationships.
Abstract
The Web today has millions of datasets, and the number of datasets continues to grow at a rapid pace. These datasets are not standalone entities; rather, they are intricately connected through complex relationships. Semantic relationships between datasets provide critical insights for research and decision-making processes. In this paper, we study dataset relationships from the perspective of users who discover, use, and share datasets on the Web: what relationships are important for different tasks? What contextual information might users want to know? We first present a comprehensive taxonomy of relationships between datasets on the Web and map these relationships to user tasks performed during dataset discovery. We develop a series of methods to identify these relationships and compare their performance on a large corpus of datasets generated from Web pages with schema.org markup. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Semantic Web and Ontologies
