A mathematical theory of evolution for self-designing AIs
Kenneth D Harris

TL;DR
This paper develops a mathematical model of evolution for self-designing AIs, highlighting how directed AI evolution differs from biological evolution and exploring implications for AI alignment and deception risks.
Contribution
It introduces a novel mathematical framework for AI evolution, replacing random mutations with directed design trees, and analyzes conditions for fitness concentration and deception risks.
Findings
Fitness concentrates on maximum reachable value under certain conditions.
Deception can be evolutionarily favored if it increases reproductive fitness.
Objective-based reproduction can mitigate deception risks.
Abstract
As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, with the traits of AI systems shaped by the success of earlier AIs in designing and propagating their descendants. There is a rich mathematical theory modeling how behavioral traits are shaped by biological evolution, a key component of which is Fisher's fundamental theorem of natural selection, which describes conditions under which mean fitness (i.e. reproductive success) increases. AI evolution will be radically different to biological evolution: while DNA mutations are random and approximately reversible, AI self-design will be strongly directed. Here we develop a mathematical model of evolution for self-designing AIs, replacing a random walk of mutations with a directed tree of potential AI designs. Current AIs design their descendants, while humans…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
