MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan, Huang, Furu Wei

TL;DR
This paper introduces mT6, an enhanced multilingual text-to-text transformer that leverages translation pairs and novel training tasks to improve cross-lingual transfer across various NLP benchmarks.
Contribution
The paper proposes mT6, a multilingual pretraining method incorporating translation pairs and new objectives, advancing cross-lingual transfer capabilities beyond mT5.
Findings
mT6 outperforms mT5 on multiple benchmarks
Improved transferability across diverse NLP tasks
Effective use of translation pairs in pretraining
Abstract
Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (mT6). Specifically, we explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption. In addition, we propose a partially non-autoregressive objective for text-to-text pre-training. We evaluate the methods on eight multilingual benchmark datasets, including sentence classification, named entity recognition, question answering, and abstractive summarization. Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · mT5 · Refunds@Expedia|||How do I get a full refund from Expedia? · Gated Linear Unit · Softmax · Adafactor · Inverse Square Root Schedule · Attention Dropout
