State-Aware Variational Thompson Sampling for Deep Q-Networks

Siddharth Aravindan; Wee Sun Lee

arXiv:2102.03719·cs.LG·February 9, 2021·1 cites

State-Aware Variational Thompson Sampling for Deep Q-Networks

Siddharth Aravindan, Wee Sun Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a state-aware variational Thompson sampling approach for deep Q-networks, enhancing exploration by conditioning parameter perturbations on the agent's current state, especially useful in high-risk situations.

Contribution

It derives a variational Thompson sampling method for DQNs, interprets NoisyNets as an approximation, and proposes SANE for state-dependent exploration.

Findings

01

State-aware perturbations improve exploration in high-risk states.

02

The method outperforms traditional NoisyNets in off-policy settings.

03

End-to-end learning of state-dependent noise enhances DQN performance.

Abstract

Thompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a high dimensional state-action space. We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution. We interpret the successful NoisyNets method \cite{fortunato2018noisy} as an approximation to the variational Thompson sampling method that we derive. Further, we propose State Aware Noisy Exploration (SANE) which seeks to improve on NoisyNets by allowing a non-uniform perturbation, where the amount of parameter perturbation is conditioned on the state of the agent. This is done with the help of an auxiliary perturbation module, whose output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nus-lid/sane
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms

MethodsAttentive Walk-Aggregating Graph Neural Network