On the Sparsity of Neural Machine Translation Models

Yong Wang; Longyue Wang; Victor O.K. Li; Zhaopeng Tu

arXiv:2010.02646·cs.CL·October 7, 2020

On the Sparsity of Neural Machine Translation Models

Yong Wang, Longyue Wang, Victor O.K. Li, Zhaopeng Tu

PDF

Open Access

TL;DR

This paper investigates the redundancy in neural machine translation models and demonstrates that pruned parameters can be reused to improve translation quality, especially in modeling lexical information.

Contribution

It introduces a method to rejuvenate pruned parameters in NMT models, improving performance and resource utilization.

Findings

01

Rejuvenated parameters can improve BLEU scores by up to +0.8 points.

02

Rejuvenated parameters enhance low-level lexical modeling.

03

Pruned parameters are effectively reusable in NMT models.

Abstract

Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications