Sailing the Information Ocean with Awareness of Currents: Discovery and   Application of Source Dependence

Laure Berti-Equille (Universite de Rennes 1); Anish Das Sarma; (Stanford University); Xin (Luna) Dong (AT&T Labs-Research); Amelie Marian; (Rutgus University); Divesh Srivastava (ATT Labs-Research)

arXiv:0909.1776·cs.DB·September 15, 2009·66 cites

Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence

Laure Berti-Equille (Universite de Rennes 1), Anish Das Sarma, (Stanford University), Xin (Luna) Dong (AT&T Labs-Research), Amelie Marian, (Rutgus University), Divesh Srivastava (ATT Labs-Research)

PDF

Open Access

TL;DR

This paper explores methods to identify dependence between information sources on the Web to improve data reliability and support technologies like data integration and Web 2.0.

Contribution

It introduces research problems and preliminary solutions for discovering source dependence at scale, addressing a gap in existing work.

Findings

01

Proposed initial approaches for source dependence discovery

02

Discussion on benefits for data integration and Web 2.0 technologies

03

Identified challenges in scalable dependence detection

Abstract

The Web has enabled the availability of a huge amount of useful information, but has also eased the ability to spread false information and rumors across multiple sources, making it hard to distinguish between what is true and what is not. Recent examples include the premature Steve Jobs obituary, the second bankruptcy of United airlines, the creation of Black Holes by the operation of the Large Hadron Collider, etc. Since it is important to permit the expression of dissenting and conflicting opinions, it would be a fallacy to try to ensure that the Web provides only consistent information. However, to help in separating the wheat from the chaff, it is essential to be able to determine dependence between sources. Given the huge number of data sources and the vast volume of conflicting data available on the Web, doing so in a scalable manner is extremely challenging and has not been…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Data Quality and Management · Semantic Web and Ontologies