Simple, Distributed, and Accelerated Probabilistic Programming

Dustin Tran; Matthew Hoffman; Dave Moore; Christopher Suter; Srinivas; Vasudevan; Alexey Radul; Matthew Johnson; Rif A. Saurous

arXiv:1811.02091·stat.ML·November 30, 2018·22 cites

Simple, Distributed, and Accelerated Probabilistic Programming

Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas, Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple, efficient approach for integrating probabilistic programming into deep learning frameworks, enabling scalable applications on modern hardware like TPUs and GPUs with significant speedups.

Contribution

It distills probabilistic programming to a single abstraction and implements it in TensorFlow, facilitating scalable probabilistic models on advanced hardware.

Findings

01

Achieves linear speedup on TPUv2 clusters for VAE and Image Transformer models.

02

Attains 100x speedup over Stan and 37x over PyMC3 using NUTS on GPUs.

03

Demonstrates practical applications in deep learning with probabilistic programming.

Abstract

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/edward2
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification · Advanced Neural Network Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax