Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts
Bert Verbruggen, Arne Vanhoyweghen, Vincent Ginis

TL;DR
This paper identifies limitations of traditional deep reinforcement learning in non-ergodic environments and proposes a method that incorporates explicit time dependence to achieve more accurate, trajectory-consistent policies.
Contribution
It introduces a novel approach that embeds temporal information into deep RL models, enabling better performance in non-ergodic settings without changing environmental feedback.
Findings
Deep RL algorithms fail in non-ergodic environments.
Adding explicit time dependence improves policy optimality.
Method works without modifying reward structures.
Abstract
Reinforcement Learning (RL) remains a central optimisation framework in machine learning. Although RL agents can converge to optimal solutions, the definition of ``optimality'' depends on the environment's statistical properties. The Bellman equation, central to most RL algorithms, is formulated in terms of expected values of future rewards. However, when ergodicity is broken, long-term outcomes depend on the specific trajectory rather than on the ensemble average. In such settings, the ensemble average diverges from the time-average growth experienced by individual agents, with expected-value formulations yielding systematically suboptimal policies. Prior studies demonstrated that traditional RL architectures fail to recover the true optimum in non-ergodic environments. We extend this analysis to deep RL implementations and show that these, too, produce suboptimal policies under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research
