Competition for attention predicts good-to-bad tipping in AI
Neil F. Johnson, Frank Y. Huo

TL;DR
This paper identifies how competition for attention in edge AI models can lead to dangerous tipping points, providing a mathematical framework to predict and control such failures across various domains and settings.
Contribution
It introduces a novel mathematical formula for the dynamical tipping point based on attention competition, applicable across multiple AI models and real-world contexts.
Findings
Validated mechanism across multiple AI models
Provides control levers for safety interventions
Applicable to diverse domains and legal frameworks
Abstract
More than half the global population now carries devices that can run ChatGPT-like language models with no Internet connection and minimal safety oversight -- and hence the potential to promote self-harm, financial losses and extremism among other dangers. Existing safety tools either require cloud connectivity or discover failures only after harm has occurred. Here we show that a large class of potentially dangerous tipping originates at the atomistic scale in such edge AI due to competition for the machinery's attention. This yields a mathematical formula for the dynamical tipping point n*, governed by dot-product competition for attention between the conversation's context and competing output basins, that reveals new control levers. Validated against multiple AI models, the mechanism can be instantiated for different definitions of 'good' and 'bad' and hence in principle applies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Misinformation and Its Impacts · Digital Mental Health Interventions
