Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

David Leeftink; Max Hinne; Marcel van Gerven

arXiv:2605.05373·cs.LG·May 12, 2026

Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

David Leeftink, Max Hinne, Marcel van Gerven

PDF

TL;DR

This paper links recurrent policy hidden states to Pontryagin co-states from optimal control, introducing a co-state loss to improve interpretability and robustness in partially observable reinforcement learning tasks.

Contribution

It establishes a formal connection between recurrent hidden states and Pontryagin co-states, enabling structured and interpretable internal dynamics in reinforcement learning policies.

Findings

01

Matching or improving performance on DMControl tasks

02

Robustness against zero-shot sensor masking

03

Provides a principled approach to policy design

Abstract

A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encoding history into latent state representations, their internal dynamics remain uninterpretable black boxes. This paper establishes a formal link between these hidden states and the Pontryagin minimum principle (PMP) from optimal control. We demonstrate that for standard recurrent architectures, latent representations map directly to PMP co-states, which allows the readout layer to be interpreted as performing Hamiltonian minimization. Because standard reward maximization does not naturally discover this alignment, we introduce a PMP-derived co-state loss to explicitly structure the internal dynamics. Empirically, this approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.