A Framework for Inherently Safer AGI through Language-Mediated Active Inference
Bo Wen

TL;DR
This paper introduces a new framework for inherently safe AGI by integrating Active Inference with Large Language Models, emphasizing transparency, hierarchical safety, and natural language-based belief management.
Contribution
It presents a novel architecture combining Active Inference principles with LLMs, embedding safety into the core design through hierarchical, language-mediated belief and preference representations.
Findings
Proposes a multi-agent system with safety constraints flowing through hierarchical Markov blankets.
Introduces mechanisms for explicit belief and preference separation in natural language.
Outlines a research agenda using the ARC benchmark to validate safety properties.
Abstract
This paper proposes a novel framework for developing safe Artificial General Intelligence (AGI) by combining Active Inference principles with Large Language Models (LLMs). We argue that traditional approaches to AI safety, focused on post-hoc interpretability and reward engineering, have fundamental limitations. We present an architecture where safety guarantees are integrated into the system's core design through transparent belief representations and hierarchical value alignment. Our framework leverages natural language as a medium for representing and manipulating beliefs, enabling direct human oversight while maintaining computational tractability. The architecture implements a multi-agent system where agents self-organize according to Active Inference principles, with preferences and safety constraints flowing through hierarchical Markov blankets. We outline specific mechanisms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
