FIT: Far-reaching Interleaved Transformers

Ting Chen; Lala Li

arXiv:2305.12689·cs.LG·May 26, 2023·5 cites

FIT: Far-reaching Interleaved Transformers

Ting Chen, Lala Li

PDF

Open Access 1 Repo

TL;DR

FIT introduces a novel transformer architecture that interleaves local and global layers with adaptive computation, enabling efficient processing of high-resolution images and large-scale data within limited memory.

Contribution

The paper proposes a new interleaved transformer architecture with local and global layers, improving efficiency and scalability for high-resolution image understanding and generation.

Findings

01

Effective in high-resolution image tasks

02

Supports training on gigabit-scale data within 16GB memory

03

Demonstrates versatility as encoder, diffusion decoder, or autoregressive decoder

Abstract

We present FIT: a transformer-based architecture with efficient self-attention and adaptive computation. Unlike original transformers, which operate on a single sequence of data tokens, we divide the data tokens into groups, with each group being a shorter sequence of tokens. We employ two types of transformer layers: local layers operate on data tokens within each group, while global layers operate on a smaller set of introduced latent tokens. These layers, comprising the same set of self-attention and feed-forward layers as standard transformers, are interleaved, and cross-attention is used to facilitate information exchange between data and latent tokens within the same group. The attention complexity is $O (n^{2})$ locally within each group of size $n$ , but can reach $O (L^{4 / 3})$ globally for sequence length of $L$ . The efficiency can be further enhanced by relying more on global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/pix2seq
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsDiffusion