Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu,, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio

TL;DR
This paper introduces a dual-stream neural network combining a recurrent slow stream for compressed representations with a Transformer fast stream for detailed processing, improving sequence learning efficiency and generalization.
Contribution
It proposes a novel architecture that integrates slow recurrent and fast Transformer streams to balance compression and expressiveness in sequence learning.
Findings
Enhanced sample efficiency in visual perception tasks
Improved generalization in sequential decision making
Effective combination of compression and detailed processing
Abstract
Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams. A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of time steps into a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Position-Wise Feed-Forward Layer · Multi-Head Attention · Byte Pair Encoding
