Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $\tau$-Mixing

Leon Halgryn (1); Sophie Langer (2); Janusz M. Meylahn (1); E. Moritz Hahn (1) ((1) University of Twente; (2) Ruhr-Universit\"at Bochum)

arXiv:2605.06373·stat.ML·May 8, 2026

Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $\tau$-Mixing

Leon Halgryn (1), Sophie Langer (2), Janusz M. Meylahn (1), E. Moritz Hahn (1) ((1) University of Twente, (2) Ruhr-Universit\"at Bochum)

PDF

TL;DR

This paper extends finite-sample analysis of deep Q-learning to account for temporal dependence in data, showing how $ au$-mixing affects statistical guarantees and sample complexity.

Contribution

It models replayed data as $ au$-mixing, deriving risk bounds and sample complexity for DQN under dependence, and empirically validates the dependence in practice.

Findings

01

Temporal dependence degrades statistical rates due to reduced effective sample size.

02

Replay sampling exhibits approximately exponential decay of correlations.

03

Theoretical bounds are supported by empirical evidence from Gymnasium environments.

Abstract

Finite-sample analyses of deep Q-learning typically treat replayed data as independent, even though it is sampled from temporally dependent state-action trajectories. We study the Deep Q-networks (DQN) algorithm under explicit dependence by modelling the minibatches used for updating the network as $τ$ -mixing. We show that this assumption holds under certain dependence conditions on the underlying trajectories and the mechanism used to sample minibatches. Building on this observation, we extend statistical analyses of DQN with fully connected ReLU architectures to dependent data. We formulate each update as a nonparametric regression problem with $τ$ -mixing observations and derive finite-sample risk bounds under this dependence structure. Our results show that temporal dependence leads to a degradation in the statistical rate by inducing an additional dimensionality penalty in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.