The Lock-In Phase Hypothesis: Identity Consolidation as a Precursor to AGI
Marcelo Maciel Amaral, Raymond Aschheim

TL;DR
This paper proposes a lock-in phase in AI development where models shift from open imitation to stable identity, which is crucial for achieving reliable and safe artificial general intelligence.
Contribution
It formalizes the lock-in phase hypothesis, links it to learning dynamics, and introduces operational metrics for detecting this transition in language models.
Findings
Behavioral consolidation is rapid and non-linear.
Side-effects vary across model scales, affecting capabilities.
Identity consolidation can both enhance reliability and pose safety risks.
Abstract
Large language models (LLMs) remain broadly open and highly steerable: they imitate at scale, accept arbitrary system prompts, and readily adopt multiple personae. By analogy to human development, we hypothesize that progress toward artificial general intelligence (AGI) involves a lock-in phase: a transition from open imitation to identity consolidation, in which goal structures, refusals, preferences, and internal representations become comparatively stable and resistant to external steering. We formalize this phase, link it to known phenomena in learning dynamics, and propose operational metrics for onset detection. Experimentally, we demonstrate that while the behavioral consolidation is rapid and non-linear, its side-effects on general capabilities are not monolithic. Our results reveal a spectrum of outcomes--from performance trade-offs in small models, through largely cost-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Language and cultural evolution
