Tradutor: Building a Variety Specific Translation Model
Hugo Sousa, Satya Almasian, Ricardo Campos, Al\'ipio Jorge

TL;DR
This paper introduces a new open-source translation model and dataset specifically for European Portuguese, addressing the gap in language variety representation and improving translation quality over existing open-source systems.
Contribution
The paper presents the first European Portuguese-specific translation model and a dedicated dataset, enhancing translation performance for this underrepresented language variety.
Findings
Our model outperforms existing open-source Portuguese translation systems.
The model approaches industry-leading performance for European Portuguese.
Publicly available dataset and models support further research.
Abstract
Language models have become foundational to many widely used systems. However, these seemingly advantageous models are double-edged swords. While they excel in tasks related to resource-rich languages like English, they often lose the fine nuances of language forms, dialects, and varieties that are inherent to languages spoken in multiple regions of the world. Languages like European Portuguese are neglected in favor of their more popular counterpart, Brazilian Portuguese, leading to suboptimal performance in various linguistic tasks. To address this gap, we introduce the first open-source translation model specifically tailored for European Portuguese, along with a novel dataset specifically designed for this task. Results from automatic evaluations on two benchmark datasets demonstrate that our best model surpasses existing open-source translation systems for Portuguese and approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTranslation Studies and Practices · Natural Language Processing Techniques
