Syntax-Infused Transformer and BERT models for Machine Translation and   Natural Language Understanding

Dhanasekar Sundararaman; Vivek Subramanian; Guoyin Wang; Shijing Si,; Dinghan Shen; Dong Wang; Lawrence Carin

arXiv:1911.06156·cs.CL·November 15, 2019·37 cites

Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

Dhanasekar Sundararaman, Vivek Subramanian, Guoyin Wang, Shijing Si,, Dinghan Shen, Dong Wang, Lawrence Carin

PDF

Open Access

TL;DR

This paper explores how explicitly incorporating syntactic information like POS tags into Transformer and BERT models enhances their performance in machine translation and natural language understanding tasks, especially with limited data.

Contribution

It introduces syntax-infused Transformer and BERT models that leverage syntactic features to improve NLP task performance, demonstrating significant gains over baseline models.

Findings

01

Syntax-infused Transformer improves BLEU by 0.7 on full dataset

02

Maximum BLEU improvement of 1.99 with limited data

03

Syntax-enhanced BERT outperforms baseline on GLUE tasks

Abstract

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder based on their relationships to all tokens in a sequence. Recent studies have shown that although such models are capable of learning syntactic features purely by seeing examples, explicitly feeding this information to deep learning models can significantly enhance their performance. Leveraging syntactic information like part of speech (POS) may be particularly beneficial in limited training data settings for complex models such as the Transformer. We show that the syntax-infused Transformer with multiple features achieves an improvement of 0.7 BLEU when trained on the full WMT 14 English to German translation dataset and a maximum improvement of 1.99…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections