Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs.   Maximizing Surprise

Stavros Sintos; Pankaj K. Agarwal; Jun Yang

arXiv:1909.05380·cs.DB·September 13, 2019·1 cites

Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise

Stavros Sintos, Pankaj K. Agarwal, Jun Yang

PDF

Open Access

TL;DR

This paper investigates how to optimally select data to clean for fact-checking, balancing between reducing uncertainty and maximizing the chance to counter false claims, with new algorithms for complex, non-linear objectives.

Contribution

It introduces a formal framework for data cleaning in fact-checking, analyzing different objectives and providing efficient algorithms for complex optimization problems.

Findings

01

Objectives of minimizing uncertainty and maximizing counterability can conflict.

02

Efficient algorithms outperform naive solutions for complex data cleaning tasks.

03

Results generalize to a broad class of functions with applications beyond fact-checking.

Abstract

We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty and errors. Second, data can be "fished" to advance particular positions. In practice, fact-checkers cannot afford to clean all data and must choose to clean what "matters the most" to checking a claim. We explore alternative definitions of what "matters the most": one is to ascertain claim qualities (by minimizing uncertainty in these measures), while an alternative is just to counter the claim (by maximizing the probability of finding a counterargument). We show whether the two objectives align with each other, with important implications on when fact-checkers should exercise care in selective data cleaning, to avoid potential bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Data Quality and Management · Bayesian Modeling and Causal Inference