Data Fusion: Resolving Conflicts from Multiple Sources
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava

TL;DR
This paper introduces a scalable data fusion method that accurately resolves conflicting information from numerous sources, including copied data, to identify true values in various data management applications.
Contribution
It presents a novel, scalable algorithm for truth discovery that effectively handles copying among sources and improves data accuracy.
Findings
Algorithm significantly improves truth discovery accuracy
Method is scalable with many data sources
Effective in real-world data integration scenarios
Abstract
Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical to resolve conflicts and discover values that reflect the real world; this task is called {\em data fusion}. This paper describes a novel approach that finds true values from conflicting information when there are a large number of sources, among which some may copy from others. We present a case study on real-world data showing that the described algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
