Amortized Variational Deep Q Network
Haotian Zhang, Yuhao Wang, Jianyong Sun, Zongben Xu

TL;DR
This paper introduces an amortized variational approach for Deep Q Networks that improves exploration efficiency, reduces parameters, and outperforms existing methods in classical control tasks with less training time.
Contribution
It proposes a novel amortized variational inference framework for Deep Q Networks, balancing exploration and exploitation with fewer parameters and enhanced performance.
Findings
Outperforms state-of-the-art methods in control tasks
Requires significantly less training time
Uses fewer learning parameters
Abstract
Efficient exploration is one of the most important issues in deep reinforcement learning. To address this issue, recent methods consider the value function parameters as random variables, and resort variational inference to approximate the posterior of the parameters. In this paper, we propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network. We establish the equivalence between the loss of the new model and the amortized variational inference loss. We realize the balance of exploration and exploitation by assuming the posterior as Cauchy and Gaussian, respectively in a two-stage training process. We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method. Experimental results on classical control tasks in OpenAI Gym and chain Markov…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
