Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise
Stavros Sintos, Pankaj K. Agarwal, Jun Yang

TL;DR
This paper investigates how to optimally select data to clean for fact-checking, balancing between reducing uncertainty and maximizing the chance to counter false claims, with new algorithms for complex, non-linear objectives.
Contribution
It introduces a formal framework for data cleaning in fact-checking, analyzing different objectives and providing efficient algorithms for complex optimization problems.
Findings
Objectives of minimizing uncertainty and maximizing counterability can conflict.
Efficient algorithms outperform naive solutions for complex data cleaning tasks.
Results generalize to a broad class of functions with applications beyond fact-checking.
Abstract
We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty and errors. Second, data can be "fished" to advance particular positions. In practice, fact-checkers cannot afford to clean all data and must choose to clean what "matters the most" to checking a claim. We explore alternative definitions of what "matters the most": one is to ascertain claim qualities (by minimizing uncertainty in these measures), while an alternative is just to counter the claim (by maximizing the probability of finding a counterargument). We show whether the two objectives align with each other, with important implications on when fact-checkers should exercise care in selective data cleaning, to avoid potential bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Data Quality and Management · Bayesian Modeling and Causal Inference
