Fusing Data with Correlations

Ravali Pochampally; Anish Das Sarma; Xin Luna Dong; Alexandra; Meliou; Divesh Srivastava

arXiv:1503.00306·cs.DB·March 3, 2015·1 cites

Fusing Data with Correlations

Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra, Meliou, Divesh Srivastava

PDF

Open Access

TL;DR

This paper introduces new methods to model and utilize correlations among data sources to improve the accuracy of truth finding in noisy, conflicting web data, addressing limitations of naive voting strategies.

Contribution

It presents novel techniques for modeling various types of source correlations and applying them to enhance truth discovery in integrated web data.

Findings

01

Improved accuracy in identifying correct data over naive voting.

02

Effective handling of both positive and negative source correlations.

03

Enhanced data cleaning in noisy, conflicting information environments.

Abstract

Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown that a na\"ive voting strategy that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlation between sources can be much broader than copying: sources may provide data from complementary domains (\emph{negative correlation}), extractors may focus on different types of information (\emph{negative correlation}), and extractors may apply common rules in extraction (\emph{positive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Web Data Mining and Analysis · Semantic Web and Ontologies