Character-level NMT and language similarity
Josef Jon, Ond\v{r}ej Bojar

TL;DR
This paper investigates the performance of character-level neural machine translation with Transformer models across languages with varying similarities and dataset sizes, highlighting when character-level input is advantageous.
Contribution
It demonstrates the conditions under which character-level segmentation improves translation quality and confirms that fine-tuning subword models to character-level enhances results for less related languages.
Findings
Character-level models excel for similar languages.
Subword segmentation outperforms character-level for less related languages.
Fine-tuning subword models to character-level improves translation quality.
Abstract
We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding
