Lite Training Strategies for Portuguese-English and English-Portuguese Translation
Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini

TL;DR
This paper presents cost-effective training strategies for Portuguese-English translation using pre-trained models like T5, achieving competitive results with modest hardware and minimal resources.
Contribution
It introduces a lightweight adaptation of T5 models for translation, including tokenizer modifications for Portuguese, and demonstrates competitive performance on benchmark datasets.
Findings
Models perform comparably to state-of-the-art systems.
Training on a single 8GB GPU for nine days is effective.
Code and models are publicly available.
Abstract
Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models. In this work, we investigate the use of pre-trained models, such as T5 for Portuguese-English and English-Portuguese translation tasks using low-cost hardware. We explore the use of Portuguese and English pre-trained language models and propose an adaptation of the English tokenizer to represent Portuguese characters, such as diaeresis, acute and grave accents. We compare our models to the Google Translate API and MarianMT on a subset of the ParaCrawl dataset, as well as to the winning submission to the WMT19 Biomedical Translation Shared Task. We also describe our submission to the WMT20 Biomedical Translation Shared Task. Our results show that our models have a competitive performance to state-of-the-art models while being trained on modest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Gated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Layer Normalization · Byte Pair Encoding · Dropout · SentencePiece · Residual Connection · Multi-Head Attention
