
TL;DR
This paper discusses how the exclusion of negative results in scientific publishing impacts the utility of large language models, emphasizing the need for failure-inclusive culture to improve research and AI training.
Contribution
It highlights the bias towards positive results in literature, analyzes its effects on LLMs, and proposes protocols and structural changes for more inclusive publishing practices.
Findings
Failure bias in literature affects LLM training and evaluation.
Inclusion of failure data can improve LLM utility and research quality.
Structural reforms are needed to foster failure-inclusive publishing.
Abstract
Scientific publishing systematically filters out negative results. We argue that this long-standing asymmetry has become an urgent problem in the era of large language models, which inherit the positive bias of the literature they are trained on, face an impending shortage of high-quality training data, and are increasingly deployed as both research tools and peer reviewers. We analyze three ways in which LLMs have changed the value of failure data and show that the systematic absence of such data degrades their utility as research tools, training data consumers, and peer reviewers alike. We outline experimental protocols to validate these claims and discuss the structural conditions under which a failure-inclusive publishing culture could emerge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
