Probabilistic Adaptive Computation Time

Michael Figurnov; Artem Sobolev; Dmitry Vetrov

arXiv:1712.00386·cs.LG·December 4, 2017

Probabilistic Adaptive Computation Time

Michael Figurnov, Artem Sobolev, Dmitry Vetrov

PDF

TL;DR

This paper introduces a probabilistic model with discrete latent variables to control computation time in deep neural networks, enabling a principled trade-off between speed and accuracy with a novel inference method.

Contribution

It proposes a probabilistic framework for adaptive computation time using discrete latent variables and a new stochastic variational optimization technique.

Findings

01

Matches the speed-accuracy trade-off of existing methods

02

Allows deterministic evaluation with lower memory usage

03

Demonstrates effectiveness on ResNet models

Abstract

We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection