# An Empirical Study of Mini-Batch Creation Strategies for Neural Machine   Translation

**Authors:** Makoto Morishita, Yusuke Oda, Graham Neubig, Koichiro Yoshino,, Katsuhito Sudoh, Satoshi Nakamura

arXiv: 1706.05765 · 2017-06-20

## TL;DR

This paper empirically compares various mini-batch creation strategies in neural machine translation training, revealing that the choice of strategy significantly impacts training efficiency and some simple shuffling can outperform length-based sorting.

## Contribution

It provides an empirical evaluation of different mini-batch creation strategies in NMT, highlighting their effects and challenging assumptions about length-based sorting.

## Key findings

- Mini-batch creation strategy greatly affects NMT training efficiency.
- Some simple shuffling strategies outperform length-based sorting.
- Empirical validation of mini-batch strategies across datasets.

## Abstract

Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.05765/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1706.05765/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1706.05765/full.md

---
Source: https://tomesphere.com/paper/1706.05765