Qualitative Measurements of Policy Discrepancy for Return-Based Deep   Q-Network

Wenjia Meng; Qian Zheng; Long Yang; Pengfei Li; Gang Pan

arXiv:1806.06953·cs.LG·December 2, 2019

Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

Wenjia Meng, Qian Zheng, Long Yang, Pengfei Li, Gang Pan

PDF

TL;DR

This paper introduces R-DQN, a framework combining DQN and return-based reinforcement learning, with a novel strategy to measure policy discrepancy, leading to improved performance on benchmark tasks.

Contribution

The paper proposes a general R-DQN framework and a new strategy with two measurements to qualitatively assess policy discrepancy, enhancing return approximation.

Findings

01

R-DQN outperforms traditional DQN on benchmark tasks.

02

The proposed measurements accurately express trace coefficients.

03

Algorithms with the new strategy outperform state-of-the-art methods.

Abstract

The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show the performance of traditional DQN can be improved effectively by introducing return-based reinforcement learning. In order to further improve the R-DQN, we design a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give the two measurements' bounds in the proposed R-DQN framework. We show that algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network