NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims
Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais

TL;DR
This paper advocates for NeurIPS to enforce reproducibility standards for AI safety claims, emphasizing transparency and evaluation integrity in high-stakes model deployment decisions.
Contribution
It proposes a three-tier disclosure framework and mandatory claim inventory to improve transparency and reproducibility of AI safety assertions.
Findings
Current safety claims are often non-reproducible, undermining trust.
Existing transparency scores are low, with inadequate disclosure of training data.
A structured disclosure framework can enhance evaluation and trustworthiness.
Abstract
Frontier AI safety claims - published assertions that a highly capable general-purpose model is below a threshold of concern, adequately mitigated, or suitable for release - increasingly shape model deployment, governance, and public trust. Yet the artefacts needed to evaluate them are routinely withheld, producing an evidential inversion: the most consequential claims in AI safety are often the least reproducible. This position paper argues that NeurIPS should require reproducibility standards for papers making such claims, treating non-reproducibility not as a transparency preference but as an evaluation-methodology failure. The 2026 International AI Safety Report [Bengio et al., 2026] concludes that reliable pre-deployment safety testing has become harder to conduct and that models now distinguish test from deployment contexts; the 2025 Foundation Model Transparency Index [Wan et…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
