Simple, Distributed, and Accelerated Probabilistic Programming
Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas, Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous

TL;DR
This paper introduces a simple, efficient approach for integrating probabilistic programming into deep learning frameworks, enabling scalable applications on modern hardware like TPUs and GPUs with significant speedups.
Contribution
It distills probabilistic programming to a single abstraction and implements it in TensorFlow, facilitating scalable probabilistic models on advanced hardware.
Findings
Achieves linear speedup on TPUv2 clusters for VAE and Image Transformer models.
Attains 100x speedup over Stan and 37x over PyMC3 using NUTS on GPUs.
Demonstrates practical applications in deep learning with probabilistic programming.
Abstract
We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification · Advanced Neural Network Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
