Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line
Maximilian Toller, Hussain Hussain, Bernhard C Geiger

TL;DR
This paper demonstrates that neural networks with activation bottlenecks, especially sigmoidal ones like LSTM and GRU, cannot accurately forecast unbounded sequences such as straight lines or random walks, due to their bounded hidden layer representations.
Contribution
The paper characterizes activation bottlenecks in neural networks and explains their impact on forecasting unbounded sequences, providing insights into architectural limitations.
Findings
Sigmoidal networks cannot forecast unbounded sequences.
Activation bottlenecks cause prediction errors to grow arbitrarily large.
Modifications to architectures can mitigate bottleneck effects.
Abstract
A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the training procedure. Widely-used neural network architectures such as LSTM and GRU suffer from this limitation. In our analysis, we characterize activation bottlenecks and explain why they prevent sigmoidal networks from learning unbounded sequences. We experimentally validate our findings and discuss modifications to network architectures which mitigate the effects of activation bottlenecks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Gated Recurrent Unit
