Modified Double DQN: addressing stability
Shervin Halat, Mohammad Mehdi Ebadzadeh, Kiana Amani

TL;DR
This paper proposes three modifications to the Double DQN algorithm to improve its stability and reduce overestimation, supported by empirical and theoretical evaluations.
Contribution
The paper introduces three novel modifications to DDQN that enhance stability and maintain or improve performance over the original algorithm.
Findings
Modified algorithms show improved stability over DDQN.
None of the modifications underperform in overestimation correction.
Empirical and theoretical results validate the effectiveness of the proposed modifications.
Abstract
Inspired by Double Q-learning algorithm, the Double-DQN (DDQN) algorithm was originally proposed in order to address the overestimation issue in the original DQN algorithm. The DDQN has successfully shown both theoretically and empirically the importance of decoupling in terms of action evaluation and selection in computation of target values; although, all the benefits were acquired with only a simple adaption to DQN algorithm, minimal possible change as it was mentioned by the authors. Nevertheless, there seems a roll-back in the proposed algorithm of DDQN since the parameters of policy network are emerged again in the target value function which were initially withdrawn by DQN with the hope of tackling the serious issue of moving targets and the instability caused by it (i.e., by moving targets) in the process of learning. Therefore, in this paper three modifications to the DDQN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection
MethodsExperience Replay · Q-Learning · Dense Connections · Double Q-learning · Double DQN · Convolution · Deep Q-Network
