Revisiting Character-Based Neural Machine Translation with Capacity and   Compression

Colin Cherry; George Foster; Ankur Bapna; Orhan Firat and; Wolfgang Macherey

arXiv:1808.09943·cs.CL·August 30, 2018

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Colin Cherry, George Foster, Ankur Bapna, Orhan Firat and, Wolfgang Macherey

PDF

TL;DR

This paper demonstrates that deep sequence-to-sequence models effectively handle character-level neural machine translation, outperforming other methods in both accuracy and computational efficiency, and introduces the first evaluation of conditional computation for NMT.

Contribution

It shows that deep models are key to character-level NMT and evaluates techniques for efficiency, including the novel use of conditional computation to skip timesteps.

Findings

01

Deep models outperform shallow ones at character level.

02

Character-level models can match or exceed word-fragment models.

03

Conditional computation reduces processing time without sacrificing accuracy.

Abstract

Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering. However, it results in longer sequences in which each symbol contains less information, creating both modeling and computational challenges. In this paper, we show that the modeling problem can be solved by standard sequence-to-sequence architectures of sufficient depth, and that deep models operating at the character level outperform identical models operating over word fragments. This result implies that alternative architectures for handling character input are better viewed as methods for reducing computation time than as improved ways of modeling longer sequences. From this perspective, we evaluate several techniques for character-level NMT, verify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.