Enhancing the Transformer Decoder with Transition-based Syntax
Leshem Choshen, Omri Abend

TL;DR
This paper introduces a transition-based tree decoding method to incorporate syntactic structures into Transformer decoders, significantly improving syntactic generalization in machine translation.
Contribution
It presents a novel transition-based approach for tree decoding that effectively integrates syntactic information into Transformer decoders, enhancing syntactic generalization.
Findings
Substantial improvements on syntactic generalization test sets.
Comparable or improved performance on standard MT benchmarks.
Qualitative analysis shows advantages of syntactic integration.
Abstract
Notwithstanding recent advances, syntactic generalization remains a challenge for text decoders. While some studies showed gains from incorporating source-side symbolic syntactic and semantic structure into text generation Transformers, very little work addressed the decoding of such structure. We propose a general approach for tree decoding using a transition-based approach. Examining the challenging test case of incorporating Universal Dependencies syntax into machine translation, we present substantial improvements on test sets that focus on syntactic generalization, while presenting improved or comparable performance on standard MT benchmarks. Further qualitative analysis addresses cases where syntactic generalization in the vanilla Transformer decoder is inadequate and demonstrates the advantages afforded by integrating syntactic information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Residual Connection · Adam · Dropout · Label Smoothing · Layer Normalization · Multi-Head Attention
