Multiple Structural Priors Guided Self Attention Network for Language Understanding
Le Qi, Yu Zhang, Qingyu Yin, Ting Liu

TL;DR
This paper introduces MS-SAN, a self-attention network that incorporates multiple structural priors like word order and syntax trees through a multi-mask attention mechanism, improving language understanding performance.
Contribution
It proposes a novel multi-mask multi-head attention mechanism to embed diverse structural priors into SANs, enhancing their ability to model complex text structures.
Findings
MS-SAN outperforms strong baselines on two NLP tasks.
Incorporating multiple structural priors improves model accuracy.
The approach effectively captures hierarchical and syntactic information.
Abstract
Self attention networks (SANs) have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, standard SANs are usually position-independent, and thus are incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on SANs for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into SAN models, proposing the Multiple Structural Priors Guided Self Attention Network (MS-SAN) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention
