Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network
Wenjia Meng, Qian Zheng, Long Yang, Pengfei Li, Gang Pan

TL;DR
This paper introduces R-DQN, a framework combining DQN and return-based reinforcement learning, with a novel strategy to measure policy discrepancy, leading to improved performance on benchmark tasks.
Contribution
The paper proposes a general R-DQN framework and a new strategy with two measurements to qualitatively assess policy discrepancy, enhancing return approximation.
Findings
R-DQN outperforms traditional DQN on benchmark tasks.
The proposed measurements accurately express trace coefficients.
Algorithms with the new strategy outperform state-of-the-art methods.
Abstract
The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show the performance of traditional DQN can be improved effectively by introducing return-based reinforcement learning. In order to further improve the R-DQN, we design a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give the two measurements' bounds in the proposed R-DQN framework. We show that algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network
