Dynamical Priors as a Training Objective in Reinforcement Learning
Sukesh Subaharan

TL;DR
This paper introduces a new training framework called DP-RL that uses dynamical priors to shape the temporal evolution of policies in reinforcement learning, promoting more coherent decision trajectories.
Contribution
The paper proposes a novel auxiliary loss based on external state dynamics to influence policy evolution without changing reward or architecture.
Findings
Dynamical priors systematically alter decision trajectories in RL agents.
DP-RL promotes temporally structured behavior not explained by simple smoothing.
The approach works across multiple minimal environments.
Abstract
Standard reinforcement learning (RL) optimizes policies for reward but imposes few constraints on how decisions evolve over time. As a result, policies may achieve high performance while exhibiting temporally incoherent behavior such as abrupt confidence shifts, oscillations, or degenerate inactivity. We introduce Dynamical Prior Reinforcement Learning (DP-RL), a training framework that augments policy gradient learning with an auxiliary loss derived from external state dynamics that implement evidence accumulation and hysteresis. Without modifying the reward, environment, or policy architecture, this prior shapes the temporal evolution of action probabilities during learning. Across three minimal environments, we show that dynamical priors systematically alter decision trajectories in task-dependent ways, promoting temporally structured behavior that cannot be explained by generic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
