Information-theoretic analysis of world models in optimal reward maximizers
Alfred Harwood, Jose Faustino, Alex Altair

TL;DR
This paper establishes an information-theoretic lower bound on the amount of environment information an optimal policy must encode, showing it is exactly n log m bits for a controlled Markov process with n states and m actions.
Contribution
It provides a rigorous proof that optimal policies inherently contain n log m bits of information about the environment across various reward objectives.
Findings
Optimal policies encode exactly n log m bits of environment information.
The bound applies broadly to different reward maximization objectives.
Provides a fundamental limit on the implicit world model in reinforcement learning.
Abstract
An important question in the field of AI is the extent to which successful behaviour requires an internal representation of the world. In this work, we quantify the amount of information an optimal policy provides about the underlying environment. We consider a Controlled Markov Process (CMP) with states and actions, assuming a uniform prior over the space of possible transition dynamics. We prove that observing a deterministic policy that is optimal for any non-constant reward function then conveys exactly bits of information about the environment. Specifically, we show that the mutual information between the environment and the optimal policy is bits. This bound holds across a broad class of objectives, including finite-horizon, infinite-horizon discounted, and time-averaged reward maximization. These findings provide a precise information-theoretic lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications · Age of Information Optimization
