Syntax-Enhanced Pre-trained Model
Zenan Xu, Daya Guo, Duyu Tang, Qinliang Su, Linjun Shou, Ming Gong,, Wanjun Zhong, Xiaojun Quan, Nan Duan, Daxin Jiang

TL;DR
This paper introduces a syntax-aware Transformer model that incorporates syntactic dependency information during both pre-training and fine-tuning, leading to improved performance on various NLP tasks without relying on human-annotated syntax.
Contribution
The paper proposes a novel syntax-aware attention mechanism and a pre-training task for syntactic distance prediction, enabling the use of automatic syntax in pre-trained models.
Findings
Automatic syntactic information improves model performance.
Global syntactic distances outperform local head relations.
Achieves state-of-the-art results on multiple NLP benchmarks.
Abstract
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the application of existing methods to broader scenarios. To address this, we present a model that utilizes the syntax of text in both pre-training and fine-tuning stages. Our model is based on Transformer with a syntax-aware attention layer that considers the dependency tree of the text. We further introduce a new pre-training task of predicting the syntactic distance among tokens in the dependency tree. We evaluate the model on three downstream tasks, including relation classification, entity typing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Dropout · Adam · Multi-Head Attention · WordPiece · Residual Connection · Byte Pair Encoding
