Lite Training Strategies for Portuguese-English and English-Portuguese   Translation

Alexandre Lopes; Rodrigo Nogueira; Roberto Lotufo; Helio Pedrini

arXiv:2008.08769·cs.CL·August 21, 2020

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini

PDF

1 Repo 1 Datasets

TL;DR

This paper presents cost-effective training strategies for Portuguese-English translation using pre-trained models like T5, achieving competitive results with modest hardware and minimal resources.

Contribution

It introduces a lightweight adaptation of T5 models for translation, including tokenizer modifications for Portuguese, and demonstrates competitive performance on benchmark datasets.

Findings

01

Models perform comparably to state-of-the-art systems.

02

Training on a single 8GB GPU for nine days is effective.

03

Code and models are publicly available.

Abstract

Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models. In this work, we investigate the use of pre-trained models, such as T5 for Portuguese-English and English-Portuguese translation tasks using low-cost hardware. We explore the use of Portuguese and English pre-trained language models and propose an adaptation of the English tokenizer to represent Portuguese characters, such as diaeresis, acute and grave accents. We compare our models to the Google Translate API and MarianMT on a subset of the ParaCrawl dataset, as well as to the winning submission to the WMT19 Biomedical Translation Shared Task. We also describe our submission to the WMT20 Biomedical Translation Shared Task. Our results show that our models have a competitive performance to state-of-the-art models while being trained on modest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

unicamp-dl/Lite-T5-Translation
pytorchOfficial

Datasets

efbaro/hospitalizations_model_test
dataset· 54 dl
54 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Gated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Layer Normalization · Byte Pair Encoding · Dropout · SentencePiece · Residual Connection · Multi-Head Attention