On the Sparsity of Neural Machine Translation Models
Yong Wang, Longyue Wang, Victor O.K. Li, Zhaopeng Tu

TL;DR
This paper investigates the redundancy in neural machine translation models and demonstrates that pruned parameters can be reused to improve translation quality, especially in modeling lexical information.
Contribution
It introduces a method to rejuvenate pruned parameters in NMT models, improving performance and resource utilization.
Findings
Rejuvenated parameters can improve BLEU scores by up to +0.8 points.
Rejuvenated parameters enhance low-level lexical modeling.
Pruned parameters are effectively reusable in NMT models.
Abstract
Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
