Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing   Mechanisms in Sequence Learning

Aniket Didolkar; Kshitij Gupta; Anirudh Goyal; Nitesh B. Gundavarapu,; Alex Lamb; Nan Rosemary Ke; Yoshua Bengio

arXiv:2205.14794·cs.LG·October 26, 2022·5 cites

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu,, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a dual-stream neural network combining a recurrent slow stream for compressed representations with a Transformer fast stream for detailed processing, improving sequence learning efficiency and generalization.

Contribution

It proposes a novel architecture that integrates slow recurrent and fast Transformer streams to balance compression and expressiveness in sequence learning.

Findings

01

Enhanced sample efficiency in visual perception tasks

02

Improved generalization in sequential decision making

03

Effective combination of compression and detailed processing

Abstract

Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams. A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Position-Wise Feed-Forward Layer · Multi-Head Attention · Byte Pair Encoding