Training Neural Machine Translation (NMT) Models using Tensor Train   Decomposition on TensorFlow (T3F)

Amelia Drew; Alexander Heinecke

arXiv:1911.01933·cs.LG·November 6, 2019

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

Amelia Drew, Alexander Heinecke

PDF

Open Access

TL;DR

This paper demonstrates the implementation of Tensor Train layers in neural machine translation models using TensorFlow, achieving competitive BLEU scores on English-Vietnamese and German-English datasets with reduced parameters.

Contribution

Introduces a Tensor Train layer in NMT models with empirical evaluation, showing its effectiveness and potential for parameter reduction and future optimization.

Findings

01

Achieved BLEU scores of 24.0 on IWSLT and WMT datasets.

02

Higher learning rates and rectangular core dimensions improve BLEU scores.

03

Tensor Train decomposition can be effectively applied to NMT models.

Abstract

We implement a Tensor Train layer in the TensorFlow Neural Machine Translation (NMT) model using the t3f library. We perform training runs on the IWSLT English-Vietnamese '15 and WMT German-English '16 datasets with learning rates $\in {0.0004, 0.0008, 0.0012}$ , maximum ranks $\in {2, 4, 8, 16}$ and a range of core dimensions. We compare against a target BLEU test score of 24.0, obtained by our benchmark run. For the IWSLT English-Vietnamese training, we obtain BLEU test/dev scores of 24.0/21.9 and 24.2/21.9 using core dimensions $(2, 2, 256) \times (2, 2, 512)$ with learning rate 0.0012 and rank distributions $(1, 4, 4, 1)$ and $(1, 4, 16, 1)$ respectively. These runs use 113\% and 397\% of the flops of the benchmark run respectively. We find that, of the parameters surveyed, a higher learning rate and more `rectangular' core dimensions generally produce higher BLEU scores. For the WMT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Topic Modeling · Parallel Computing and Optimization Techniques

MethodsTest