Approximate Thompson Sampling via Epistemic Neural Networks
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla,, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

TL;DR
This paper explores using epistemic neural networks to efficiently approximate Thompson sampling in complex environments, demonstrating that small networks can match large ensembles' performance at lower computational cost.
Contribution
It introduces the epinet, a small neural network for uncertainty estimation, enabling scalable approximate Thompson sampling in complex settings.
Findings
ENNs effectively approximate TS in bandit and RL environments.
The epinet matches large ensemble performance with significantly less computation.
Quality of joint predictive distributions is crucial for TS effectiveness.
Abstract
Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Mobile Crowdsensing and Crowdsourcing
MethodsSpatio-temporal stability analysis
