Deep Double Q-learning

Prabhat Nagarajan; Martha White; Marlos C. Machado

arXiv:2507.00275·cs.LG·May 18, 2026

Deep Double Q-learning

Prabhat Nagarajan, Martha White, Marlos C. Machado

PDF

TL;DR

Deep Double Q-learning (DDQL) is a novel deep reinforcement learning algorithm that explicitly trains two Q-functions to reduce overestimation bias, leading to improved performance across Atari games.

Contribution

It introduces DDQL, which explicitly trains two Q-functions in deep RL, improving over Double DQN by reducing overestimation and stabilizing training.

Findings

01

DDQL outperforms Double DQN on 47 of 57 Atari games.

02

DDQL reduces overestimation bias compared to Double DQN.

03

Training stability is improved through specific techniques like lower replay ratios.

Abstract

Double Q-learning is a classical control algorithm that mitigates the maximization bias of Q-learning. To do so, it explicitly trains two independent action-value functions and uses them to decouple action-selection and action-evaluation when computing bootstrap targets. Double DQN adapts target bootstrap decoupling to deep reinforcement learning (RL), but explicitly trains only a single action-value function and does not fully decouple its estimators. Consequently, the two estimators remain correlated, and overestimation persists. In this paper, we introduce Deep Double Q-learning (DDQL), a deep RL algorithm that explicitly trains two Q-functions through Double Q-learning. DDQL stabilizes training through a combination of techniques, including lower replay ratios, longer target network update intervals, and shared layers. Across 57 Atari 2600 games, DDQL improves aggregate performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance