How Does Selective Mechanism Improve Self-Attention Networks?

Xinwei Geng; Longyue Wang; Xing Wang; Bing Qin; Ting Liu; Zhaopeng Tu

arXiv:2005.00979·cs.CL·May 5, 2020·1 cites

How Does Selective Mechanism Improve Self-Attention Networks?

Xinwei Geng, Longyue Wang, Xing Wang, Bing Qin, Ting Liu, Zhaopeng Tu

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the selective mechanism enhances self-attention networks in NLP by focusing on content words, empirically validating improvements across multiple tasks and explaining the underlying reasons.

Contribution

It introduces a flexible Gumbel-Softmax-based selective mechanism for SANs and demonstrates its effectiveness and interpretability in NLP tasks.

Findings

01

SSANs outperform standard SANs in NLP tasks

02

Selective mechanism mitigates word order and structure modeling weaknesses

03

Focuses attention on content words improves semantic understanding

Abstract

Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Softmax. Experimental results on several representative NLP tasks, including natural language inference, semantic role labelling, and machine translation, show that SSANs consistently outperform the standard SANs. Through well-designed probing experiments, we empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling. Specifically, the selective mechanism improves SANs by paying more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xwgeng/SSAN
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research