Learning to Segment Inputs for NMT Favors Character-Level Processing
Julia Kreutzer, Artem Sokolov

TL;DR
This paper introduces a trainable dynamic segmentation algorithm for neural machine translation that favors character-level processing, improving flexibility over static segmentation methods.
Contribution
It proposes an end-to-end trainable dynamic segmentation method based on Adaptative Computation Time, enabling NMT models to choose optimal segmentation levels during training.
Findings
Models prefer character-level segmentation when given the choice.
Dynamic segmentation improves translation performance across tasks.
Supports development of purely character-level NMT systems.
Abstract
Most modern neural machine translation (NMT) systems rely on presegmented inputs. Segmentation granularity importantly determines the input and output sequence lengths, hence the modeling depth, and source and target vocabularies, which in turn determine model size, computational costs of softmax normalization, and handling of out-of-vocabulary words. However, the current practice is to use static, heuristic-based segmentations that are fixed before NMT training. This begs the question whether the chosen segmentation is optimal for the translation task. To overcome suboptimal segmentation choices, we present an algorithm for dynamic segmentation based on the Adaptative Computation Time algorithm (Graves 2016), that is trainable end-to-end and driven by the NMT objective. In an evaluation on four translation tasks we found that, given the freedom to navigate between different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
