Stable but Wrong: When More Data Degrades Scientific Conclusions
Zhipeng Zhang, Kai Li

TL;DR
This paper reveals that in certain regimes, increasing observational data can lead to more confident but systematically incorrect scientific conclusions, exposing fundamental limits of data-driven inference.
Contribution
It identifies a structural regime where standard inference converges to wrong conclusions despite passing diagnostics, highlighting intrinsic limits of data reliance in science.
Findings
Additional data can amplify errors instead of correcting them.
Residual diagnostics can be misleadingly normal despite incorrect conclusions.
There is an intrinsic limit where data quality, not quantity, determines inference validity.
Abstract
Modern science increasingly relies on ever-growing observational datasets and automated inference pipelines, under the implicit belief that accumulating more data makes scientific conclusions more reliable. Here we show that this belief can fail in a fundamental and irreversible way. We identify a structural regime in which standard inference procedures converge smoothly, remain well calibrated, and pass conventional diagnostic checks, yet systematically converge to incorrect conclusions. This failure arises when the reliability of observations degrades in a manner that is intrinsically unobservable to the inference process itself. Using minimal synthetic experiments, we demonstrate that in this regime additional data do not correct error but instead amplify it, while residual-based and goodness-of-fit diagnostics remain misleadingly normal. These results reveal an intrinsic limit of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhilosophy and History of Science · Explainable Artificial Intelligence (XAI) · Scientific Computing and Data Management
