The NiuTrans System for WNGT 2020 Efficiency Task

Chi Hu; Bei Li; Ye Lin; Yinqiao Li; Yanyang Li; Chenglong Wang; Tong; Xiao; Jingbo Zhu

arXiv:2109.08008·cs.CL·September 17, 2021

The NiuTrans System for WNGT 2020 Efficiency Task

Chi Hu, Bei Li, Ye Lin, Yinqiao Li, Yanyang Li, Chenglong Wang, Tong, Xiao, Jingbo Zhu

PDF

Open Access 2 Repos

TL;DR

This paper presents the NiuTrans system's efficient Transformer-based neural machine translation approach for WNGT 2020, emphasizing model compression, optimized inference, and achieving high translation speed and quality.

Contribution

The paper introduces a flexible toolkit and techniques for efficient Transformer implementation, combining model compression, knowledge distillation, and optimized inference methods.

Findings

01

Achieved over 40,000 tokens/sec translation speed.

02

Maintained 42.9 BLEU on newstest2018.

03

Demonstrated effective efficiency improvements in neural machine translation.

Abstract

This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task. We focus on the efficient implementation of deep Transformer models \cite{wang-etal-2019-learning, li-etal-2019-niutrans} using NiuTensor (https://github.com/NiuTrans/NiuTensor), a flexible toolkit for NLP tasks. We explored the combination of deep encoder and shallow decoder in Transformer models via model compression and knowledge distillation. The neural machine translation decoding also benefits from FP16 inference, attention caching, dynamic batching, and batch pruning. Our systems achieve promising results in both translation quality and efficiency, e.g., our fastest system can translate more than 40,000 tokens per second with an RTX 2080 Ti while maintaining 42.9 BLEU on \textit{newstest2018}. The code, models, and docker images are available at NiuTrans.NMT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Softmax · Byte Pair Encoding · Layer Normalization