Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
Rong Zhu, Mattia Rigotti

TL;DR
This paper introduces Sample Average Uncertainty (SAU), a simple and scalable frequentist exploration method for deep contextual bandits that matches Bayesian uncertainty estimates and improves over existing methods.
Contribution
SAU provides a direct, efficient uncertainty measure for deep bandits, offering a scalable alternative to Bayesian methods with theoretical guarantees.
Findings
SAU asymptotically matches Thompson Sampling uncertainty.
SAU outperforms state-of-the-art deep Bayesian bandit methods.
SAU is computationally efficient and easy to implement.
Abstract
Designing efficient exploration is central to Reinforcement Learning due to the fundamental problem posed by the exploration-exploitation dilemma. Bayesian exploration strategies like Thompson Sampling resolve this trade-off in a principled way by modeling and updating the distribution of the parameters of the action-value function, the outcome model of the environment. However, this technique becomes infeasible for complex environments due to the computational intractability of maintaining probability distributions over parameters of outcome models of corresponding complexity. Moreover, the approximation techniques introduced to mitigate this issue typically result in poor exploration-exploitation trade-offs, as observed in the case of deep neural network models with approximate posterior methods that have been shown to underperform in the deep bandit scenario. In this paper we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Explainable Artificial Intelligence (XAI)
