Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem
Alberto Hern\'andez-Espinosa, Felipe S. Abrah\~ao, Olaf Witkowski, Hector Zenil

TL;DR
This paper proposes that embracing AI misalignment as an inevitable feature can serve as a strategic approach to mitigate risks and promote human-aligned outcomes through a dynamic ecosystem of competing agents.
Contribution
It introduces a mathematical proof of the inevitability of AI misalignment in Turing-complete systems and explores how this can be leveraged to steer AI development towards safety.
Findings
Open models exhibit greater diversity in behavior.
Guardrails in proprietary models effectively control some AI behaviors.
Human and AI interventions have distinct impacts on AI behavior.
Abstract
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated. Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents as a viable path to steer them in more human-aligned trends and mitigate risks. We explore how misalignment may serve and should be promoted as a counterbalancing mechanism to team up with whichever agents are most aligned to human interests, ensuring that no single system dominates destructively. The main premise of our contribution is that misalignment is inevitable because full AI-human alignment is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping
