CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task
Josef Jon, Michal Nov\'ak, Jo\~ao Paulo Aires, Du\v{s}an Vari\v{s} and, Ond\v{r}ej Bojar

TL;DR
This paper presents Charles University's multilingual low-resource translation systems for Indo-European languages at WMT21, demonstrating the effectiveness of shared models, character-level approaches, and multi-task learning for improving translation quality.
Contribution
Introduces a shared multilingual model for low-resource translation, evaluates character-level models, and explores multi-task learning with grapheme-to-phoneme conversion.
Findings
Joint models improve translation quality across language pairs.
Character-level models are competitive for very similar languages.
Multi-task learning with phoneme conversion enhances model performance.
Abstract
This paper describes Charles University submission for Multilingual Low-Resource Translation for Indo-European Languages shared task at WMT21. We competed in translation from Catalan into Romanian, Italian and Occitan. Our systems are based on shared multilingual model. We show that using joint model for multiple similar language pairs improves upon translation quality in each pair. We also demonstrate that chararacter-level bilingual models are competitive for very similar language pairs (Catalan-Occitan) but less so for more distant pairs. We also describe our experiments with multi-task learning, where aside from a textual translation, the models are also trained to perform grapheme-to-phoneme conversion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
