Amortized Variational Deep Q Network

Haotian Zhang; Yuhao Wang; Jianyong Sun; Zongben Xu

arXiv:2011.01706·cs.LG·November 4, 2020

Amortized Variational Deep Q Network

Haotian Zhang, Yuhao Wang, Jianyong Sun, Zongben Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an amortized variational approach for Deep Q Networks that improves exploration efficiency, reduces parameters, and outperforms existing methods in classical control tasks with less training time.

Contribution

It proposes a novel amortized variational inference framework for Deep Q Networks, balancing exploration and exploitation with fewer parameters and enhanced performance.

Findings

01

Outperforms state-of-the-art methods in control tasks

02

Requires significantly less training time

03

Uses fewer learning parameters

Abstract

Efficient exploration is one of the most important issues in deep reinforcement learning. To address this issue, recent methods consider the value function parameters as random variables, and resort variational inference to approximate the posterior of the parameters. In this paper, we propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network. We establish the equivalence between the loss of the new model and the amortized variational inference loss. We realize the balance of exploration and exploitation by assuming the posterior as Cauchy and Gaussian, respectively in a two-stage training process. We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method. Experimental results on classical control tasks in OpenAI Gym and chain Markov…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyhwhy/Amortized-Variational-DQN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification