Data Ordering Patterns for Neural Machine Translation: An Empirical   Study

Siddhant Garg

arXiv:1909.10642·cs.CL·September 25, 2019

Data Ordering Patterns for Neural Machine Translation: An Empirical Study

Siddhant Garg

PDF

Open Access 1 Datasets

TL;DR

This paper empirically investigates various data ordering strategies for neural machine translation training, finding that pre-ordering data by perplexity scores from a pre-trained model yields the best performance.

Contribution

It introduces an empirical analysis of different data ordering methods, highlighting the effectiveness of perplexity-based pre-ordering over random shuffling.

Findings

01

Perplexity-based data ordering outperforms random shuffling.

02

Pre-fixing data order improves model performance and convergence.

03

Different ordering metrics impact translation quality.

Abstract

Recent works show that ordering of the training data affects the model performance for Neural Machine Translation. Several approaches involving dynamic data ordering and data sharding based on curriculum learning have been analysed for the their performance gains and faster convergence. In this work we propose to empirically study several ordering approaches for the training data based on different metrics and evaluate their impact on the model performance. Results from our study show that pre-fixing the ordering of the training data based on perplexity scores from a pre-trained model performs the best and outperforms the default approach of randomly shuffling the training data every epoch.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies