A Framework for Inherently Safer AGI through Language-Mediated Active Inference

Bo Wen

arXiv:2508.05766·cs.AI·August 11, 2025

A Framework for Inherently Safer AGI through Language-Mediated Active Inference

Bo Wen

PDF

Open Access

TL;DR

This paper introduces a new framework for inherently safe AGI by integrating Active Inference with Large Language Models, emphasizing transparency, hierarchical safety, and natural language-based belief management.

Contribution

It presents a novel architecture combining Active Inference principles with LLMs, embedding safety into the core design through hierarchical, language-mediated belief and preference representations.

Findings

01

Proposes a multi-agent system with safety constraints flowing through hierarchical Markov blankets.

02

Introduces mechanisms for explicit belief and preference separation in natural language.

03

Outlines a research agenda using the ARC benchmark to validate safety properties.

Abstract

This paper proposes a novel framework for developing safe Artificial General Intelligence (AGI) by combining Active Inference principles with Large Language Models (LLMs). We argue that traditional approaches to AI safety, focused on post-hoc interpretability and reward engineering, have fundamental limitations. We present an architecture where safety guarantees are integrated into the system's core design through transparent belief representations and hierarchical value alignment. Our framework leverages natural language as a medium for representing and manipulating beliefs, enabling direct human oversight while maintaining computational tractability. The architecture implements a multi-agent system where agents self-organize according to Active Inference principles, with preferences and safety constraints flowing through hierarchical Markov blankets. We outline specific mechanisms for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications