Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts

Bert Verbruggen; Arne Vanhoyweghen; Vincent Ginis

arXiv:2601.08726·cs.LG·January 14, 2026

Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts

Bert Verbruggen, Arne Vanhoyweghen, Vincent Ginis

PDF

Open Access

TL;DR

This paper identifies limitations of traditional deep reinforcement learning in non-ergodic environments and proposes a method that incorporates explicit time dependence to achieve more accurate, trajectory-consistent policies.

Contribution

It introduces a novel approach that embeds temporal information into deep RL models, enabling better performance in non-ergodic settings without changing environmental feedback.

Findings

01

Deep RL algorithms fail in non-ergodic environments.

02

Adding explicit time dependence improves policy optimality.

03

Method works without modifying reward structures.

Abstract

Reinforcement Learning (RL) remains a central optimisation framework in machine learning. Although RL agents can converge to optimal solutions, the definition of ``optimality'' depends on the environment's statistical properties. The Bellman equation, central to most RL algorithms, is formulated in terms of expected values of future rewards. However, when ergodicity is broken, long-term outcomes depend on the specific trajectory rather than on the ensemble average. In such settings, the ensemble average diverges from the time-average growth experienced by individual agents, with expected-value formulations yielding systematically suboptimal policies. Prior studies demonstrated that traditional RL architectures fail to recover the true optimum in non-ergodic environments. We extend this analysis to deep RL implementations and show that these, too, produce suboptimal policies under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research