Does DQN Learn?

Aditya Gopalan; Gugan Thoppe

arXiv:2205.13617·cs.LG·June 18, 2025

Does DQN Learn?

Aditya Gopalan, Gugan Thoppe

PDF

Open Access

TL;DR

This paper demonstrates that Deep Q-Networks (DQN) can fail to improve upon initial policies even with unlimited data, and provides a theoretical explanation for this sub-optimality using linear DQN analysis.

Contribution

It shows empirically that DQN can perform worse than the initial policy and offers a theoretical framework explaining why linear DQN converges to sub-optimal fixed points.

Findings

01

DQN often yields policies worse than initial guesses.

02

Linear DQN's limit points are fixed points of projected Bellman operators.

03

Fixed points may not be near-optimal or even good policies.

Abstract

A primary requirement for any reinforcement learning method is that it should produce policies that improve upon the initial guess. In this work, we show that the widely used Deep Q-Network (DQN) fails to satisfy this minimal criterion -- even when it gets to see all possible states and actions infinitely often (a condition under which tabular Q-learning is guaranteed to converge to the optimal Q-value function). Our specific contributions are twofold. First, we numerically show that DQN often returns a policy that performs worse than the initial one. Second, we offer a theoretical explanation for this phenomenon in linear DQN, a simplified version of DQN that uses linear function approximation in place of neural networks while retaining the other key components such as $ϵ$ -greedy exploration, experience replay, and target network. Using tools from differential inclusion theory,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advancements in Semiconductor Devices and Circuit Design · Evolutionary Algorithms and Applications

MethodsSarsa