Attention over pre-trained Sentence Embeddings for Long Document Classification
Amine Abdaoui, Sourav Dutta

TL;DR
This paper proposes a linear-scalable attention architecture that leverages pre-trained sentence embeddings for long document classification, achieving competitive results and better performance when using frozen transformers.
Contribution
It introduces a simple, efficient architecture combining pre-trained sentence transformers with a small attention layer for long document classification.
Findings
Competitive results on three datasets.
Better performance with frozen transformers.
Effective alternative to complex long-document models.
Abstract
Despite being the current de-facto models in most NLP tasks, transformers are often limited to short sequences due to their quadratic attention complexity on the number of tokens. Several attempts to address this issue were studied, either by reducing the cost of the self-attention computation or by modeling smaller sequences and combining them through a recurrence mechanism or using a new transformer model. In this paper, we suggest to take advantage of pre-trained sentence transformers to start from semantically meaningful embeddings of the individual sentences, and then combine them through a small attention layer that scales linearly with the document length. We report the results obtained by this simple architecture on three standard document classification datasets. When compared with the current state-of-the-art models using standard fine-tuning, the studied method obtains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
