TL;DR
The paper introduces LSG attention, a new efficient mechanism for Transformers that enables processing longer sequences and adapting pretrained models without retraining, improving performance on NLP tasks involving long documents.
Contribution
The paper presents the LSG attention architecture combining Local, Sparse, and Global attention, allowing efficient extrapolation of pretrained Transformers to longer sequences without additional training.
Findings
LSG attention is fast and efficient for long document tasks.
It enables adaptation of pretrained models to longer sequences without retraining.
LSG achieves competitive results in classification and summarization tasks.
Abstract
Transformer models achieve state-of-the-art performance on a wide range of NLP tasks. They however suffer from a prohibitive limitation due to the self-attention mechanism, inducing complexity with regard to sequence length. To answer this limitation we introduce the LSG architecture which relies on Local, Sparse and Global attention. We show that LSG attention is fast, efficient and competitive in classification and summarization tasks on long documents. Interestingly, it can also be used to adapt existing pretrained models to efficiently extrapolate to longer sequences with no additional training. Along with the introduction of the LSG attention mechanism, we propose tools to train new models and adapt existing ones based on this mechanism.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ccdv/lsg-legal-base-uncased-4096model· 12 dl· ♡ 212 dl♡ 2
- 🤗ccdv/lsg-legal-small-uncased-4096model· 16 dl16 dl
- 🤗ccdv/lsg-bart-base-4096model· 12 dl· ♡ 312 dl♡ 3
- 🤗ccdv/lsg-bart-large-4096model· 8 dl8 dl
- 🤗ccdv/lsg-barthez-4096model· 9 dl· ♡ 19 dl♡ 1
- 🤗ccdv/lsg-camembert-base-4096model· 21 dl· ♡ 321 dl♡ 3
- 🤗ccdv/lsg-base-4096model· 9 dl· ♡ 29 dl♡ 2
- 🤗ccdv/lsg-pegasus-large-4096model· 13 dl13 dl
- 🤗ccdv/lsg-distilbert-base-uncased-4096model· 11 dl11 dl
- 🤗ccdv/lsg-distilcamembert-base-4096model· 11 dl· ♡ 111 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
