Extinction Risks from AI: Invisible to Science?
Vojtech Kovarik, Christian van Merwijk, Ida Mattsson

TL;DR
This paper explores the theoretical challenges in modeling extinction risks from AI, proposing conditions for effective models and suggesting that such risks may be inherently difficult to detect scientifically.
Contribution
It identifies necessary conditions for models assessing AI extinction risks and highlights the complexity that may render these risks scientifically invisible.
Findings
Conditions for informative models are outlined.
Model complexity may hinder empirical evaluation.
Risks might be inherently undetectable with current science.
Abstract
In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity", and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not. As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult. This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
