Linearizing Transformer with Key-Value Memory

Yizhe Zhang; Deng Cai

arXiv:2203.12644·cs.CL·October 14, 2022

Linearizing Transformer with Key-Value Memory

Yizhe Zhang, Deng Cai

PDF

Open Access

TL;DR

MemSizer is a novel transformer variant that combines low-rank projection and kernel-based methods to achieve linear time complexity, constant memory, and improved performance on sequence generation tasks.

Contribution

It introduces MemSizer, a new approach that enhances efficiency and accuracy of transformers by integrating low-dimensional projections with recurrent-style incremental computation.

Findings

01

MemSizer achieves linear inference time and constant memory.

02

It outperforms vanilla and other efficient transformers in translation, summarization, and language modeling.

03

MemSizer maintains high accuracy even with short sequence generation.

Abstract

Efficient transformer variants with linear time complexity have been developed to mitigate the quadratic computational overhead of the vanilla transformer. Among them are low-rank projection methods such as Linformer and kernel-based Transformers. Despite their unique merits, they usually suffer from a performance drop comparing with the vanilla transformer on many sequence generation tasks, and often fail to obtain computation gain when the generation is short. We propose MemSizer, an approach towards closing the performance gap while improving the efficiency even with short generation. It projects the source sequences into lower dimension representations like Linformer, while enjoying efficient recurrent-style incremental computation similar to kernel-based transformers. This yields linear computation time and constant memory complexity at inference time. MemSizer also employs a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Multi-Head Linear Attention · Dense Connections · Residual Connection · Softmax · Layer Normalization · Linformer