GTrans: Grouping and Fusing Transformer Layers for Neural Machine   Translation

Jian Yang; Yuwei Yin; Liqun Yang; Shuming Ma; Haoyang Huang; Dongdong; Zhang; Furu Wei; Zhoujun Li

arXiv:2207.14467·cs.CL·November 14, 2022·1 cites

GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation

Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong, Zhang, Furu Wei, Zhoujun Li

PDF

Open Access 1 Repo

TL;DR

GTrans introduces a flexible grouping and fusion mechanism for Transformer layers in neural machine translation, leveraging multi-layer features to improve translation quality across various benchmarks.

Contribution

The paper proposes GTrans, a novel model that groups and fuses features from all Transformer layers, effectively utilizing bottom-layer information often ignored in standard models.

Findings

01

GTrans outperforms standard Transformer models on multiple translation benchmarks.

02

The model scales effectively to 60 encoder and 36 decoder layers.

03

Experimental results show consistent performance gains.

Abstract

Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation. However, vanilla Transformer mainly exploits the top-layer representation, assuming the lower layers provide trivial or redundant information and thus ignoring the bottom-layer feature that is potentially valuable. In this work, we propose the Group-Transformer model (GTrans) that flexibly divides multi-layer representations of both encoder and decoder into different groups and then fuses these group features to generate target words. To corroborate the effectiveness of the proposed method, extensive experiments and analytic experiments are conducted on three bilingual translation benchmarks and two multilingual translation tasks, including the IWLST-14, IWLST-17, LDC, WMT-14 and OPUS-100 benchmark. Experimental and analytical results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YuweiYin/GTrans
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding