Optimizing Segmentation Granularity for Neural Machine Translation

Elizabeth Salesky; Andrew Runge; Alex Coda; Jan Niehues; and Graham; Neubig

arXiv:1810.08641·cs.CL·October 23, 2018

Optimizing Segmentation Granularity for Neural Machine Translation

Elizabeth Salesky, Andrew Runge, Alex Coda, Jan Niehues, and Graham, Neubig

PDF

TL;DR

This paper introduces an automatic method to optimize subword segmentation granularity in neural machine translation, matching grid search performance without extra training time, and improving rare word handling.

Contribution

It proposes an online, incremental approach to tune subword units during training, eliminating the need for resource-intensive hyperparameter searches.

Findings

01

Matches grid search results in segmentation quality

02

Enhances training efficiency

03

Improves rare word translation performance

Abstract

In neural machine translation (NMT), it is has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.