Character-level Chinese-English Translation through ASCII Encoding

Nikola I. Nikolov; Yuhuang Hu; Mi Xue Tan; Richard H.R. Hahnloser

arXiv:1805.03330·cs.CL·August 28, 2018

Character-level Chinese-English Translation through ASCII Encoding

Nikola I. Nikolov, Yuhuang Hu, Mi Xue Tan, Richard H.R. Hahnloser

PDF

1 Repo

TL;DR

This paper introduces a novel approach for character-level Chinese-English translation by encoding Chinese characters using Wubi, enabling effective neural machine translation despite the writing system differences.

Contribution

It proposes using Wubi encoding to adapt character-level NMT models for Chinese, bridging the gap between Chinese and English translation.

Findings

01

Wubi encoding preserves shape and semantics of Chinese characters.

02

Wubi-based models perform well at character and subword levels.

03

Recurrent and convolutional models show promising results.

Abstract

Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

duguyue100/wmt-en2wubi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.