Identifying And Weighting Integration Hypotheses On Open Data Platforms
Julian Eberius, Katrin Braunschweig, Maik Thiele, Wolfgang Lehner

TL;DR
This paper addresses the challenges of data integration on open data platforms by proposing a method to identify and rank integration hypotheses, evaluated on a large platform to improve data standardization and interoperability.
Contribution
It introduces a novel approach for identifying and weighting integration hypotheses specifically tailored for open data platforms, enhancing crowd-based data integration techniques.
Findings
Effective identification of integration hypotheses
Improved ranking of hypotheses based on relevance
Successful evaluation on a large open data platform
Abstract
Open data platforms such as data.gov or opendata.socrata. com provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems. At the same time, crowd-based data integration techniques are emerging as new way of dealing with these problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper discusses integration problems on Open Data Platforms, and proposes a method for identifying and ranking integration hypotheses in this context. We will evaluate our findings by conducting a comprehensive evaluation using on one of the largest Open Data platforms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Privacy-Preserving Technologies in Data
