Soft Q Network
Jingbin Liu, Shuai Liu, Xinyang Gu

TL;DR
This paper introduces the Soft Q Network (SQN), integrating entropy regularization into Deep Q Networks to improve exploration, stability, and efficiency, demonstrated through experiments on the Google Research Football environment.
Contribution
The paper proposes SQN with entropy regularization, revealing the connection between soft Q learning and policy improvement, and introduces an on-policy deep Q learning algorithm, QOP.
Findings
QOP shows high stability in training GRF agents
SQN benefits from entropy regularization for better exploration
QOP outperforms traditional methods in the GRF environment
Abstract
Deep Q Network (DQN) is a very successful algorithm, yet the inherent problem of reinforcement learning, i.e. the exploit-explore balance, remains. In this work, we introduce entropy regularization into DQN and propose SQN. We find that the backup equation of soft Q learning can enjoy the corrective feedback if we view the soft backup as policy improvement in the form of Q, instead of policy evaluation. We show that Soft Q Learning with Corrective Feedback (SQL-CF) underlies the on-plicy nature of SQL and the equivalence of SQL and Soft Policy Gradient (SPG). With these insights, we propose an on-policy version of deep Q learning algorithm, i.e. Q On-Policy (QOP). We experiment with QOP on a self-play environment called Google Research Football (GRF). The QOP algorithm exhibits great stability and efficiency in training GRF agents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
MethodsQ-Learning · Entropy Regularization · Dense Connections · Convolution · Deep Q-Network
