Approximate Thompson Sampling via Epistemic Neural Networks

Ian Osband; Zheng Wen; Seyed Mohammad Asghari; Vikranth Dwaracherla,; Morteza Ibrahimi; Xiuyuan Lu; Benjamin Van Roy

arXiv:2302.09205·cs.LG·February 21, 2023

Approximate Thompson Sampling via Epistemic Neural Networks

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla,, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

PDF

Open Access 1 Repo

TL;DR

This paper explores using epistemic neural networks to efficiently approximate Thompson sampling in complex environments, demonstrating that small networks can match large ensembles' performance at lower computational cost.

Contribution

It introduces the epinet, a small neural network for uncertainty estimation, enabling scalable approximate Thompson sampling in complex settings.

Findings

01

ENNs effectively approximate TS in bandit and RL environments.

02

The epinet matches large ensemble performance with significantly less computation.

03

Quality of joint predictive distributions is crucial for TS effectiveness.

Abstract

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepmind/enn_acme
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Mobile Crowdsensing and Crowdsourcing

MethodsSpatio-temporal stability analysis