CUNI systems for WMT21: Multilingual Low-Resource Translation for   Indo-European Languages Shared Task

Josef Jon; Michal Nov\'ak; Jo\~ao Paulo Aires; Du\v{s}an Vari\v{s} and; Ond\v{r}ej Bojar

arXiv:2109.09354·cs.CL·September 21, 2021

CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task

Josef Jon, Michal Nov\'ak, Jo\~ao Paulo Aires, Du\v{s}an Vari\v{s} and, Ond\v{r}ej Bojar

PDF

Open Access

TL;DR

This paper presents Charles University's multilingual low-resource translation systems for Indo-European languages at WMT21, demonstrating the effectiveness of shared models, character-level approaches, and multi-task learning for improving translation quality.

Contribution

Introduces a shared multilingual model for low-resource translation, evaluates character-level models, and explores multi-task learning with grapheme-to-phoneme conversion.

Findings

01

Joint models improve translation quality across language pairs.

02

Character-level models are competitive for very similar languages.

03

Multi-task learning with phoneme conversion enhances model performance.

Abstract

This paper describes Charles University submission for Multilingual Low-Resource Translation for Indo-European Languages shared task at WMT21. We competed in translation from Catalan into Romanian, Italian and Occitan. Our systems are based on shared multilingual model. We show that using joint model for multiple similar language pairs improves upon translation quality in each pair. We also demonstrate that chararacter-level bilingual models are competitive for very similar language pairs (Catalan-Occitan) but less so for more distant pairs. We also describe our experiments with multi-task learning, where aside from a textual translation, the models are also trained to perform grapheme-to-phoneme conversion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis