
TL;DR
This paper argues that large neural networks are inherently unreliable without provable safety measures and advocates for comprehensive world models, including social and mental aspects, to ensure AI safety and reliability.
Contribution
It highlights the limitations of current neural nets, discusses the architecture of world models, and emphasizes the importance of modeling human social understanding for AI reliability.
Findings
Neural nets cannot be reliably validated or extrapolated.
World models should include physical, social, and mental domains.
AI systems need to represent a common ground with users.
Abstract
While large neural nets perform impressively on specific tasks, they are unreliable and unsafe, as is shown by the persistent hallucinations of large language models. This paper shows that large neural nets are intrinsically unreliable, because it is not possible to make or validate a tractable theory of how a neural net works. There is no reliable way to extrapolate its performance from a limited number of test cases to an unlimited set of use cases. To have confidence in the performance of a neural net, it is necessary to enclose it in a guardrail which is provably safe, so that whatever the neural net does, there cannot be harmful consequences. World models have been proposed as a way to do this. This paper discusses the scope and architecture required of world models. World models are often conceived as models of the physical and natural world, using established theories of natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Language and cultural evolution · Topic Modeling
