Graph-to-Sequence Neural Machine Translation
Sufeng Duan, Hai Zhao, Rui Wang

TL;DR
This paper introduces Graph-Transformer, a graph-to-sequence neural machine translation model that explicitly captures subgraph information at various dependency levels, improving translation quality over standard Transformer models.
Contribution
It presents a novel graph-based SAN model for NMT that explicitly models subgraphs of different orders, enhancing the ability to capture dependency structures.
Findings
Improves BLEU scores by 1.1 on WMT14 English-German
Enhances translation quality on IWSLT14 German-English
Effectively captures multi-level dependency information
Abstract
Neural machine translation (NMT) usually works in a seq2seq learning way by viewing either source or target sentence as a linear sequence of words, which can be regarded as a special case of graph, taking words in the sequence as nodes and relationships between words as edges. In the light of the current NMT models more or less capture graph information among the sequence in a latent way, we present a graph-to-sequence model facilitating explicit graph information capturing. In detail, we propose a graph-based SAN-based NMT model called Graph-Transformer by capturing information of subgraphs of different orders in every layers. Subgraphs are put into different groups according to their orders, and every group of subgraphs respectively reflect different levels of dependency between words. For fusing subgraph representations, we empirically explore three methods which weight different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Bioinformatics
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Layer Normalization · Sequence to Sequence · Dropout · Dense Connections
