Too good to be true: when overwhelming evidence fails to convince
Lachlan J. Gunn, Fran\c{c}ois Chapeau-Blondeau, Mark McDonnell, Bruce, Davis, Andrew Allison, and Derek Abbott

TL;DR
This paper investigates how systemic failures can cause confidence in a hypothesis to decrease despite overwhelming evidence, challenging assumptions of independence and highlighting risks in cryptographic testing.
Contribution
It provides a Bayesian framework to analyze the impact of systemic failures on confidence levels, with practical examples from archaeology, legal evidence, and cryptography.
Findings
High confidence is hard to achieve even with low systemic failure rates.
Cryptographic tests may underestimate false-negative rates by up to 2^80.
Overwhelming evidence can paradoxically reduce confidence when systemic failures are considered.
Abstract
Is it possible for a large sequence of measurements or observations, which support a hypothesis, to counterintuitively decrease our confidence? Can unanimous support be too good to be true? The assumption of independence is often made in good faith, however rarely is consideration given to whether a systemic failure has occurred. Taking this into account can cause certainty in a hypothesis to decrease as the evidence for it becomes apparently stronger. We perform a probabilistic Bayesian analysis of this effect with examples based on (i) archaeological evidence, (ii) weighing of legal evidence, and (iii) cryptographic primality testing. We find that even with surprisingly low systemic failure rates high confidence is very difficult to achieve and in particular we find that certain analyses of cryptographically-important numerical tests are highly optimistic, underestimating their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
