LSG Attention: Extrapolation of pretrained Transformers to long   sequences

Charles Condevaux; S\'ebastien Harispe

arXiv:2210.15497·cs.CL·October 28, 2022

LSG Attention: Extrapolation of pretrained Transformers to long sequences

Charles Condevaux, S\'ebastien Harispe

PDF

1 Repo 10 Models

TL;DR

The paper introduces LSG attention, a new efficient mechanism for Transformers that enables processing longer sequences and adapting pretrained models without retraining, improving performance on NLP tasks involving long documents.

Contribution

The paper presents the LSG attention architecture combining Local, Sparse, and Global attention, allowing efficient extrapolation of pretrained Transformers to longer sequences without additional training.

Findings

01

LSG attention is fast and efficient for long document tasks.

02

It enables adaptation of pretrained models to longer sequences without retraining.

03

LSG achieves competitive results in classification and summarization tasks.

Abstract

Transformer models achieve state-of-the-art performance on a wide range of NLP tasks. They however suffer from a prohibitive limitation due to the self-attention mechanism, inducing $O (n^{2})$ complexity with regard to sequence length. To answer this limitation we introduce the LSG architecture which relies on Local, Sparse and Global attention. We show that LSG attention is fast, efficient and competitive in classification and summarization tasks on long documents. Interestingly, it can also be used to adapt existing pretrained models to efficiently extrapolate to longer sequences with no additional training. Along with the introduction of the LSG attention mechanism, we propose tools to train new models and adapt existing ones based on this mechanism.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ccdv-ai/convert_checkpoint_to_lsg
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.