Approximation to Deep Q-Network by Stochastic Delay Differential   Equations

Jianya Lu; Yingjun Mo

arXiv:2505.00382·cs.LG·May 2, 2025

Approximation to Deep Q-Network by Stochastic Delay Differential Equations

Jianya Lu, Yingjun Mo

PDF

TL;DR

This paper models Deep Q-Networks using stochastic delay differential equations to analyze their stability and convergence, providing a theoretical framework that explains key techniques like experience replay and target networks.

Contribution

It introduces a novel SDDE-based framework for analyzing DQNs, establishing convergence and stability results that connect discrete algorithms to continuous systems.

Findings

01

Wasserstein-1 distance between DQN and SDDE converges to zero as step size decreases

02

Delay term in SDDE explains stability of target network in DQN

03

Provides theoretical insights into experience replay and target network techniques

Abstract

Despite the significant breakthroughs that the Deep Q-Network (DQN) has brought to reinforcement learning, its theoretical analysis remains limited. In this paper, we construct a stochastic differential delay equation (SDDE) based on the DQN algorithm and estimate the Wasserstein-1 distance between them. We provide an upper bound for the distance and prove that the distance between the two converges to zero as the step size approaches zero. This result allows us to understand DQN's two key techniques, the experience replay and the target network, from the perspective of continuous systems. Specifically, the delay term in the equation, corresponding to the target network, contributes to the stability of the system. Our approach leverages a refined Lindeberg principle and an operator comparison to establish these results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution · Q-Learning · Dense Connections · Experience Replay · Deep Q-Network