Multiple Structural Priors Guided Self Attention Network for Language   Understanding

Le Qi; Yu Zhang; Qingyu Yin; Ting Liu

arXiv:2012.14642·cs.CL·January 1, 2021

Multiple Structural Priors Guided Self Attention Network for Language Understanding

Le Qi, Yu Zhang, Qingyu Yin, Ting Liu

PDF

Open Access

TL;DR

This paper introduces MS-SAN, a self-attention network that incorporates multiple structural priors like word order and syntax trees through a multi-mask attention mechanism, improving language understanding performance.

Contribution

It proposes a novel multi-mask multi-head attention mechanism to embed diverse structural priors into SANs, enhancing their ability to model complex text structures.

Findings

01

MS-SAN outperforms strong baselines on two NLP tasks.

02

Incorporating multiple structural priors improves model accuracy.

03

The approach effectively captures hierarchical and syntactic information.

Abstract

Self attention networks (SANs) have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, standard SANs are usually position-independent, and thus are incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on SANs for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into SAN models, proposing the Multiple Structural Priors Guided Self Attention Network (MS-SAN) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention