Optimizing Segmentation Granularity for Neural Machine Translation
Elizabeth Salesky, Andrew Runge, Alex Coda, Jan Niehues, and Graham, Neubig

TL;DR
This paper introduces an automatic method to optimize subword segmentation granularity in neural machine translation, matching grid search performance without extra training time, and improving rare word handling.
Contribution
It proposes an online, incremental approach to tune subword units during training, eliminating the need for resource-intensive hyperparameter searches.
Findings
Matches grid search results in segmentation quality
Enhances training efficiency
Improves rare word translation performance
Abstract
In neural machine translation (NMT), it is has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
