Patching Leaks in the Charformer for Efficient Character-Level   Generation

Lukas Edman; Antonio Toral; Gertjan van Noord

arXiv:2205.14086·cs.CL·May 30, 2022·1 cites

Patching Leaks in the Charformer for Efficient Character-Level Generation

Lukas Edman, Antonio Toral, Gertjan van Noord

PDF

Open Access 1 Repo

TL;DR

This paper addresses the information leak problem in Charformer, enabling efficient character-level generation in Transformers, and demonstrates that it can improve training speed and maintain translation quality for morphologically-rich languages.

Contribution

It introduces a solution to prevent information leak in Charformer, allowing effective character grouping in Transformer decoders, and shows benefits in training speed and translation for morphologically-rich languages.

Findings

01

Charformer downsampling speeds up training by ~30%

02

No significant translation quality difference with previous methods

03

Potential benefits for morphologically-rich language translation

Abstract

Character-based representations have important advantages over subword-based ones for morphologically rich languages. They come with increased robustness to noisy input and do not need a separate tokenization step. However, they also have a crucial disadvantage: they notably increase the length of text sequences. The GBST method from Charformer groups (aka downsamples) characters to solve this, but allows information to leak when applied to a Transformer decoder. We solve this information leak issue, thereby enabling character grouping in the decoder. We show that Charformer downsampling has no apparent benefits in NMT over previous downsampling methods in terms of translation quality, however it can be trained roughly 30% faster. Promising performance on English--Turkish translation indicate the potential of character-level models for morphologically-rich languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leukas/patchinggbst
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Softmax · Dense Connections · Absolute Position Encodings · Dropout · GBST · Byte Pair Encoding · Position-Wise Feed-Forward Layer