Character-level NMT and language similarity

Josef Jon; Ond\v{r}ej Bojar

arXiv:2308.04398·cs.CL·August 9, 2023

Character-level NMT and language similarity

Josef Jon, Ond\v{r}ej Bojar

PDF

Open Access

TL;DR

This paper investigates the performance of character-level neural machine translation with Transformer models across languages with varying similarities and dataset sizes, highlighting when character-level input is advantageous.

Contribution

It demonstrates the conditions under which character-level segmentation improves translation quality and confirms that fine-tuning subword models to character-level enhances results for less related languages.

Findings

01

Character-level models excel for similar languages.

02

Subword segmentation outperforms character-level for less related languages.

03

Fine-tuning subword models to character-level improves translation quality.

Abstract

We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding