M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network
Zhe Zhang, Yukun Zou, Junjie Lai, Qing Xu

TL;DR
M$^2$DQN introduces a Max-Mean loss framework that enhances data efficiency and accelerates learning in Deep Q-Networks by minimizing the maximum TD-error across multiple experience batches, improving speed and performance.
Contribution
The paper proposes a novel Max-Mean loss framework for DQN that improves data efficiency and learning speed by focusing on the maximum TD-error among multiple experience batches.
Findings
Significant improvement in learning speed and performance in gym games.
Effective integration with existing DQN techniques like Double DQN.
Enhanced data efficiency in reinforcement learning applications.
Abstract
Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep neural networks and leads to a widespread application of reinforcement learning. One challenging problem when applying DQN or other reinforcement learning algorithms to real world problem is data collection. Therefore, how to improve data efficiency is one of the most important problems in the research of reinforcement learning. In this paper, we propose a framework which uses the Max-Mean loss in Deep Q-Network (MDQN). Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such that the maximum TD-error of these batches is minimized. The proposed method can be combined with most of existing techniques of DQN algorithm by replacing the loss function. We verify the effectiveness of this framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Reinforcement Learning in Robotics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Dense Connections · Double Q-learning · Double DQN · Q-Learning · Convolution · Experience Replay · Deep Q-Network
