Recurrent networks, hidden states and beliefs in partially observable environments
Gaspard Lambrechts, Adrien Bolland, Damien Ernst

TL;DR
This paper demonstrates that recurrent neural networks trained in partially observable environments internally develop representations that approximate the belief state, correlating with relevant variables for optimal control and improving policy performance.
Contribution
It empirically shows that RNN hidden states encode sufficient statistics of the belief, enhancing understanding of their role in reinforcement learning in partially observable settings.
Findings
Hidden states correlate with relevant beliefs as training progresses.
Higher mutual information between hidden states and beliefs leads to better returns.
Irrelevant belief components' mutual information decreases during learning.
Abstract
Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. Many methods rely on the approximation of a value function to derive near-optimal policies. In partially observable environments, these functions depend on the complete sequence of observations and past actions, called the history. In this work, we show empirically that recurrent neural networks trained to approximate such value functions internally filter the posterior probability distribution of the current state given the history, called the belief. More precisely, we show that, as a recurrent neural network learns the Q-function, its hidden states become more and more correlated with the beliefs of state variables that are relevant to optimal control. This correlation is measured through their mutual information. In addition, we show that the expected return of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Neural Networks and Applications
