$\infty$-former: Infinite Memory Transformer

Pedro Henrique Martins; Zita Marinho; Andr\'e F. T. Martins

arXiv:2109.00301·cs.CL·March 28, 2022·1 cites

$\infty$-former: Infinite Memory Transformer

Pedro Henrique Martins, Zita Marinho, Andr\'e F. T. Martins

PDF

Open Access 1 Repo 1 Video

TL;DR

The $ abla$-former introduces an unbounded long-term memory for transformers using continuous-space attention, enabling modeling of arbitrarily long contexts with fixed computational cost, demonstrated across various tasks.

Contribution

It presents the $ abla$-former, a transformer variant with unbounded memory and fixed complexity, using continuous attention to handle arbitrarily long sequences.

Findings

01

Successfully models long sequences in synthetic and real tasks.

02

Maintains fixed computational complexity regardless of context length.

03

Demonstrates improved long-term information retention.

Abstract

Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$ -former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the $\infty$ -former's attention complexity becomes independent of the context length, trading off memory length with precision. In order to control where precision is more important, $\infty$ -former maintains "sticky memories" being able to model arbitrarily long contexts while keeping the computation budget fixed. Experiments on a synthetic sorting task, language modeling, and document…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deep-spin/infinite-former
jaxOfficial

Videos

∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)· youtube

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax