Syntax-guided Localized Self-attention by Constituency Syntactic   Distance

Shengyuan Hou; Jushi Kai; Haotian Xue; Bingyu Zhu; Bo Yuan; Longtao; Huang; Xinbing Wang; Zhouhan Lin

arXiv:2210.11759·cs.CL·October 24, 2022

Syntax-guided Localized Self-attention by Constituency Syntactic Distance

Shengyuan Hou, Jushi Kai, Haotian Xue, Bingyu Zhu, Bo Yuan, Longtao, Huang, Xinbing Wang, Zhouhan Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a syntax-guided localized self-attention mechanism for Transformers that leverages external constituency parsing to improve translation performance across various datasets and languages.

Contribution

It proposes a novel attention mechanism that incorporates external syntactic structures, enhancing Transformer performance without relying solely on data-driven syntactic learning.

Findings

01

Consistent improvement in translation quality across multiple datasets.

02

Effective incorporation of external syntactic information.

03

Enhanced performance with different source languages.

Abstract

Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information from data is not necessary if we can leverage an external syntactic parser, which provides better parsing quality with well-defined syntactic structures. This could potentially improve Transformer's performance and sample efficiency. In this work, we propose a syntax-guided localized self-attention for Transformer that allows directly incorporating grammar structures from an external constituency parser. It prohibits the attention mechanism to overweight the grammatically distant tokens over close ones. Experimental results show that our model could consistently improve translation performance on a variety of machine translation datasets, ranging from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lumia-group/distance_transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection