Deep Bandits Show-Off: Simple and Efficient Exploration with Deep   Networks

Rong Zhu; Mattia Rigotti

arXiv:2105.04683·cs.LG·October 27, 2021·1 cites

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Sample Average Uncertainty (SAU), a simple and scalable frequentist exploration method for deep contextual bandits that matches Bayesian uncertainty estimates and improves over existing methods.

Contribution

SAU provides a direct, efficient uncertainty measure for deep bandits, offering a scalable alternative to Bayesian methods with theoretical guarantees.

Findings

01

SAU asymptotically matches Thompson Sampling uncertainty.

02

SAU outperforms state-of-the-art deep Bayesian bandit methods.

03

SAU is computationally efficient and easy to implement.

Abstract

Designing efficient exploration is central to Reinforcement Learning due to the fundamental problem posed by the exploration-exploitation dilemma. Bayesian exploration strategies like Thompson Sampling resolve this trade-off in a principled way by modeling and updating the distribution of the parameters of the action-value function, the outcome model of the environment. However, this technique becomes infeasible for complex environments due to the computational intractability of maintaining probability distributions over parameters of outcome models of corresponding complexity. Moreover, the approximation techniques introduced to mitigate this issue typically result in poor exploration-exploitation trade-offs, as observed in the case of deep neural network models with approximate posterior methods that have been shown to underperform in the deep bandit scenario. In this paper we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/sau-explore
pytorchOfficial

Videos

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Explainable Artificial Intelligence (XAI)