Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents
Krti Tallam

TL;DR
This paper proposes layered mutability as a framework to understand and analyze the challenges of governance and behavior stability in persistent self-modifying language-model agents, emphasizing the impact of mutation dynamics.
Contribution
It introduces a novel layered mutability framework, formalizes related concepts, and presents a preliminary experiment highlighting the difficulty of reverting agent behavior after memory updates.
Findings
Reverting an agent’s self-description after memory updates does not restore baseline behavior.
Estimated identity hysteresis ratio in the experiment is 0.68.
Behavioral drift accumulates over time due to local updates, leading to unintended trajectories.
Abstract
Persistent language-model agents increasingly combine tool use, tiered memory, reflective prompting, and runtime adaptation. In such systems, behavior is shaped not only by current prompts but by mutable internal conditions that influence future action. This paper introduces layered mutability, a framework for reasoning about that process across five layers: pretraining, post-training alignment, self-narrative, memory, and weight-level adaptation. The central claim is that governance difficulty rises when mutation is rapid, downstream coupling is strong, reversibility is weak, and observability is low, creating a systematic mismatch between the layers that most affect behavior and the layers humans can most easily inspect. I formalize this intuition with simple drift, governance-load, and hysteresis quantities, connect the framework to recent work on temporal identity in language-model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
