Information-theoretic analysis of world models in optimal reward maximizers

Alfred Harwood; Jose Faustino; Alex Altair

arXiv:2602.12963·cs.AI·February 16, 2026

Information-theoretic analysis of world models in optimal reward maximizers

Alfred Harwood, Jose Faustino, Alex Altair

PDF

Open Access

TL;DR

This paper establishes an information-theoretic lower bound on the amount of environment information an optimal policy must encode, showing it is exactly n log m bits for a controlled Markov process with n states and m actions.

Contribution

It provides a rigorous proof that optimal policies inherently contain n log m bits of information about the environment across various reward objectives.

Findings

01

Optimal policies encode exactly n log m bits of environment information.

02

The bound applies broadly to different reward maximization objectives.

03

Provides a fundamental limit on the implicit world model in reinforcement learning.

Abstract

An important question in the field of AI is the extent to which successful behaviour requires an internal representation of the world. In this work, we quantify the amount of information an optimal policy provides about the underlying environment. We consider a Controlled Markov Process (CMP) with $n$ states and $m$ actions, assuming a uniform prior over the space of possible transition dynamics. We prove that observing a deterministic policy that is optimal for any non-constant reward function then conveys exactly $n lo g m$ bits of information about the environment. Specifically, we show that the mutual information between the environment and the optimal policy is $n lo g m$ bits. This bound holds across a broad class of objectives, including finite-horizon, infinite-horizon discounted, and time-averaged reward maximization. These findings provide a precise information-theoretic lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Age of Information Optimization