BPE-Dropout: Simple and Effective Subword Regularization

Ivan Provilkov; Dmitrii Emelianenko; Elena Voita

arXiv:1910.13267·cs.CL·May 5, 2020

BPE-Dropout: Simple and Effective Subword Regularization

Ivan Provilkov, Dmitrii Emelianenko, Elena Voita

PDF

5 Repos 4 Models

TL;DR

This paper introduces BPE-dropout, a simple regularization method that stochastically corrupts BPE segmentation to produce multiple segmentations, improving machine translation quality by up to 3 BLEU points.

Contribution

It demonstrates that BPE can inherently produce multiple segmentations and proposes BPE-dropout to enhance model robustness and translation performance.

Findings

01

Improves translation quality by up to 3 BLEU points.

02

Enables BPE to produce multiple segmentations.

03

Compatible with standard BPE during inference.

Abstract

Subword segmentation is widely used to address the open vocabulary problem in machine translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE), which keeps the most frequent words intact while splitting the rare ones into multiple tokens. While multiple segmentations are possible even with the same vocabulary, BPE splits words into unique sequences; this may prevent a model from better learning the compositionality of words and being robust to segmentation errors. So far, the only way to overcome this BPE imperfection, its deterministic nature, was to create another subword segmentation algorithm (Kudo, 2018). In contrast, we show that BPE itself incorporates the ability to produce multiple segmentations of the same word. We introduce BPE-dropout - simple and effective subword regularization method based on and compatible with conventional BPE. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsByte Pair Encoding