Explicit Sparse Transformer: Concentrated Attention Through Explicit   Selection

Guangxiang Zhao; Junyang Lin; Zhiyuan Zhang; Xuancheng Ren; Qi Su; Xu; Sun

arXiv:1912.11637·cs.CL·December 30, 2019·77 cites

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Qi Su, Xu, Sun

PDF

Open Access 2 Repos

TL;DR

The paper introduces Explicit Sparse Transformer, a model that enhances attention focus on relevant segments, improving performance and efficiency across NLP and vision tasks.

Contribution

It proposes a novel explicit selection mechanism for sparse attention, leading to better focus and faster inference compared to previous methods.

Findings

01

Improved model performance on NLP and vision tasks.

02

Achieved faster inference speed, twice as fast as sparsemax.

03

Comparable or better results than existing sparse attention methods.

Abstract

Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called \textbf{Explicit Sparse Transformer}. Explicit Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing and computer vision tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Explicit Sparse Transformer in model performance. We also show that our proposed sparse attention method achieves comparable or better results than the previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsLinear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Sparsemax · Residual Connection · Attention Dropout · Linear Warmup With Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia?