Fast Training of NMT Model with Data Sorting

Daniela N. Rim; Kimera Richard; Heeyoul Choi

arXiv:2308.08153·cs.CL·August 17, 2023

Fast Training of NMT Model with Data Sorting

Daniela N. Rim, Kimera Richard, Heeyoul Choi

PDF

Open Access

TL;DR

This paper introduces a data sorting algorithm for neural machine translation training that reduces computational waste by sorting sentence pairs by length, leading to faster training without sacrificing performance.

Contribution

The proposed partial sorting algorithm improves training efficiency in NMT models by minimizing unnecessary computation, applicable across different architectures.

Findings

01

Reduced training time in English-Korean and English-Luganda translation tasks

02

Maintained translation quality despite data sorting

03

Applicable to various Transformer-based models

Abstract

The Transformer model has revolutionized Natural Language Processing tasks such as Neural Machine Translation, and many efforts have been made to study the Transformer architecture, which increased its efficiency and accuracy. One potential area for improvement is to address the computation of empty tokens that the Transformer computes only to discard them later, leading to an unnecessary computational burden. To tackle this, we propose an algorithm that sorts translation sentence pairs based on their length before batching, minimizing the waste of computing power. Since the amount of sorting could violate the independent and identically distributed (i.i.d) data assumption, we sort the data partially. In experiments, we apply the proposed method to English-Korean and English-Luganda language pairs for machine translation and show that there are gains in computational time while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Layer Normalization · Softmax · Absolute Position Encodings · Residual Connection · Dense Connections · Dropout