The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
Federico Pigozzi, Michael Levin

TL;DR
This paper investigates how causal emergence in neural-network reinforcement learning agents predicts final rewards and aligns with learning progress across various environments and architectures.
Contribution
It introduces the Causally Emergent Alignment Hypothesis, linking causal emergence to successful learning and reward prediction in artificial agents.
Findings
Causal emergence predicts final reward early in training.
Representational dynamics align with reward improvement.
Causal emergence may serve as an axis of neural reorganization.
Abstract
A hallmark of life on Earth is the ability of agents to exert causal power and be drivers of subsequent events. This is key to cognition at all scales. Causal emergence, measuring the degree to which an agent exerts unique predictive power on its future, is one consequence of causal power. Indeed, recent discoveries have shown that biological agents, even minimal ones, increase their causal emergence after learning new memories. However, there is a major knowledge gap regarding how causally emergent artificial agents are. We focused on Reinforcement Learning (RL) of neural-network agents across an array of environmental conditions, encompassing different algorithms, agent architectures, and six environments arranged on a complexity spectrum. For consistency, we computed the causal emergence of their latent-space representations over their lifetimes. We used the recently proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
