Compression of Neural Machine Translation Models via Pruning

Abigail See; Minh-Thang Luong; Christopher D. Manning

arXiv:1606.09274·cs.AI·July 1, 2016

Compression of Neural Machine Translation Models via Pruning

Abigail See, Minh-Thang Luong, Christopher D. Manning

PDF

1 Repo

TL;DR

This paper explores simple magnitude-based pruning methods to significantly reduce the size of neural machine translation models, achieving up to 80% pruning with minimal performance loss and even surpassing original accuracy after retraining.

Contribution

It introduces and evaluates three straightforward pruning schemes for NMT models, demonstrating effective compression and performance recovery through retraining.

Findings

01

40% model size reduction with minimal performance loss

02

80% pruning with performance recovery after retraining

03

Redundancy distribution insights in NMT architectures

Abstract

Neural Machine Translation (NMT), like many other deep learning domains, typically suffers from over-parameterization, resulting in large storage sizes. This paper examines three simple magnitude-based pruning schemes to compress NMT models, namely class-blind, class-uniform, and class-distribution, which differ in terms of how pruning thresholds are computed for the different classes of weights in the NMT architecture. We demonstrate the efficacy of weight pruning as a compression technique for a state-of-the-art NMT system. We show that an NMT model with over 200 million parameters can be pruned by 40% with very little performance loss as measured on the WMT'14 English-German translation task. This sheds light on the distribution of redundancy in the NMT architecture. Our main result is that with retraining, we can recover and even surpass the original performance with an 80%-pruned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zepx/pytorch-weight-prune
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.