Surprisal-Triggered Conditional Computation with Neural Networks
Loren Lugosch, Derek Nowrouzezahrai, Brett H. Meyer

TL;DR
This paper introduces a neural network approach that dynamically allocates computational resources based on input difficulty, using surprisal to decide between small and large networks, improving efficiency in speech recognition tasks.
Contribution
It proposes a novel method that uses autoregressive model surprisal to trigger conditional computation, reducing FLOPs while maintaining performance.
Findings
Achieves 15% reduction in FLOPs compared to always using the large network.
Matches baseline performance with less computational cost.
Demonstrates effectiveness on speech recognition tasks.
Abstract
Autoregressive neural network models have been used successfully for sequence generation, feature extraction, and hypothesis scoring. This paper presents yet another use for these models: allocating more computation to more difficult inputs. In our model, an autoregressive model is used both to extract features and to predict observations in a stream of input observations. The surprisal of the input, measured as the negative log-likelihood of the current observation according to the autoregressive model, is used as a measure of input difficulty. This in turn determines whether a small, fast network, or a big, slow network, is used. Experiments on two speech recognition tasks show that our model can match the performance of a baseline in which the big network is always used with 15% fewer FLOPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Topic Modeling · Music and Audio Processing
