Identifying And Weighting Integration Hypotheses On Open Data Platforms

Julian Eberius; Katrin Braunschweig; Maik Thiele; Wolfgang Lehner

arXiv:1205.2465·cs.DB·May 14, 2012

Identifying And Weighting Integration Hypotheses On Open Data Platforms

Julian Eberius, Katrin Braunschweig, Maik Thiele, Wolfgang Lehner

PDF

Open Access

TL;DR

This paper addresses the challenges of data integration on open data platforms by proposing a method to identify and rank integration hypotheses, evaluated on a large platform to improve data standardization and interoperability.

Contribution

It introduces a novel approach for identifying and weighting integration hypotheses specifically tailored for open data platforms, enhancing crowd-based data integration techniques.

Findings

01

Effective identification of integration hypotheses

02

Improved ranking of hypotheses based on relevance

03

Successful evaluation on a large open data platform

Abstract

Open data platforms such as data.gov or opendata.socrata. com provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems. At the same time, crowd-based data integration techniques are emerging as new way of dealing with these problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper discusses integration problems on Open Data Platforms, and proposes a method for identifying and ranking integration hypotheses in this context. We will evaluate our findings by conducting a comprehensive evaluation using on one of the largest Open Data platforms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Privacy-Preserving Technologies in Data