In-training Matrix Factorization for Parameter-frugal Neural Machine Translation
Zachary Kaden, Teven Le Scao, Raphael Olivier

TL;DR
This paper introduces in-training matrix factorization for neural machine translation, significantly reducing model size by decomposing parameter matrices during training, especially effective on embedding layers, with minimal or positive impact on translation quality.
Contribution
It presents a novel in-training matrix factorization method that compresses neural translation models by decomposing parameter matrices during training, reducing parameters without performance loss.
Findings
Reduced nearly 50% of learnable parameters without BLEU score loss.
Particularly effective on embedding layers, sometimes improving performance.
Applicable across different neural architecture layers.
Abstract
In this paper, we propose the use of in-training matrix factorization to reduce the model size for neural machine translation. Using in-training matrix factorization, parameter matrices may be decomposed into the products of smaller matrices, which can compress large machine translation architectures by vastly reducing the number of learnable parameters. We apply in-training matrix factorization to different layers of standard neural architectures and show that in-training factorization is capable of reducing nearly 50% of learnable parameters without any associated loss in BLEU score. Further, we find that in-training matrix factorization is especially powerful on embedding layers, providing a simple and effective method to curtail the number of parameters with minimal impact on model performance, and, at times, an increase in performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
