Token Turing Machines

Michael S. Ryoo; Keerthana Gopalakrishnan; Kumara Kahatapitiya; Ted; Xiao; Kanishka Rao; Austin Stone; Yao Lu; Julian Ibarz; Anurag Arnab

arXiv:2211.09119·cs.LG·April 14, 2023

Token Turing Machines

Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted, Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

PDF

Open Access 1 Repo 1 Models

TL;DR

Token Turing Machines (TTM) are a novel Transformer-based model with external memory designed for efficient long-sequence visual understanding, outperforming existing models on real-world tasks.

Contribution

The paper introduces Token Turing Machines, a new model combining Transformers with external memory for improved long-sequence processing in visual tasks.

Findings

01

TTM outperforms other Transformer models on sequential visual tasks.

02

Efficient processing of long sequences with bounded computational cost.

03

Model effectively integrates memory for real-world applications.

Abstract

We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history (i.e., frames). This memory is efficiently addressed, read and written using a Transformer as the processing unit/controller at each step. The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step. We show that TTM outperforms other alternatives, such as other Transformer models designed for long sequences and recurrent neural networks, on two real-world sequential visual understanding tasks: online temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/scenic
jaxOfficial

Models

🤗
fcxfcx/owlv2
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Residual Connection · Softmax · Adam