Probabilistic Adaptive Computation Time
Michael Figurnov, Artem Sobolev, Dmitry Vetrov

TL;DR
This paper introduces a probabilistic model with discrete latent variables to control computation time in deep neural networks, enabling a principled trade-off between speed and accuracy with a novel inference method.
Contribution
It proposes a probabilistic framework for adaptive computation time using discrete latent variables and a new stochastic variational optimization technique.
Findings
Matches the speed-accuracy trade-off of existing methods
Allows deterministic evaluation with lower memory usage
Demonstrates effectiveness on ResNet models
Abstract
We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
