Fusing Data with Correlations
Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra, Meliou, Divesh Srivastava

TL;DR
This paper introduces new methods to model and utilize correlations among data sources to improve the accuracy of truth finding in noisy, conflicting web data, addressing limitations of naive voting strategies.
Contribution
It presents novel techniques for modeling various types of source correlations and applying them to enhance truth discovery in integrated web data.
Findings
Improved accuracy in identifying correct data over naive voting.
Effective handling of both positive and negative source correlations.
Enhanced data cleaning in noisy, conflicting information environments.
Abstract
Many applications rely on Web data and extraction systems to accomplish knowledge-driven tasks. Web information is not curated, so many sources provide inaccurate, or conflicting information. Moreover, extraction systems introduce additional noise to the data. We wish to automatically distinguish correct data and erroneous data for creating a cleaner set of integrated data. Previous work has shown that a na\"ive voting strategy that trusts data provided by the majority or at least a certain number of sources may not work well in the presence of copying between the sources. However, correlation between sources can be much broader than copying: sources may provide data from complementary domains (\emph{negative correlation}), extractors may focus on different types of information (\emph{negative correlation}), and extractors may apply common rules in extraction (\emph{positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Web Data Mining and Analysis · Semantic Web and Ontologies
