On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

Alexander Galozy

arXiv:2602.21424·cs.LG·March 23, 2026

On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

Alexander Galozy

PDF

Open Access

TL;DR

This paper investigates how internal information conditioning in reinforcement learning policies affects their behavioural consistency under policy transformations, revealing structural limitations and contraction properties.

Contribution

It formalizes behavioural dependency and behavioural distance, proving non-closure under convex aggregation and contraction under convex combination, with conditions for gradient descent effects.

Findings

01

Behavioural dependency set is not convex.

02

Behavioural distance contracts under convex combination.

03

Gradient ascent can decrease behavioural distance under certain conditions.

Abstract

Reinforcement learning (RL) agents under partial observability often condition actions on internally accumulated information such as memory or inferred latent context. We formalise such information-conditioned interaction patterns as behavioural dependency: variation in action selection with respect to internal information under fixed observations. This induces a probe-relative notion of $ϵ$ -behavioural equivalence and a within-policy behavioural distance that quantifies probe sensitivity. We establish three structural results. First, the set of policies exhibiting non-trivial behavioural dependency is not closed under convex aggregation. Second, behavioural distance contracts under convex combination. Third, we prove a sufficient local condition under which gradient ascent on a skewed mixture objective decreases behavioural distance when a dominant-mode gradient aligns with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Game Theory and Applications · Opinion Dynamics and Social Influence