mLongT5: A Multilingual and Efficient Text-To-Text Transformer for   Longer Sequences

David Uthus; Santiago Onta\~n\'on; Joshua Ainslie; Mandy Guo

arXiv:2305.11129·cs.CL·October 30, 2023·1 cites

mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences

David Uthus, Santiago Onta\~n\'on, Joshua Ainslie, Mandy Guo

PDF

Open Access 1 Repo 3 Models

TL;DR

mLongT5 is a multilingual, efficient transformer model designed for long input sequences, outperforming existing models in multilingual summarization and question-answering tasks by leveraging LongT5 architecture and multilingual pretraining datasets.

Contribution

The paper introduces mLongT5, a novel multilingual text-to-text transformer optimized for long sequences, combining LongT5 architecture with multilingual pretraining techniques.

Findings

01

mLongT5 outperforms mBART and M-BERT on multilingual tasks

02

The model effectively handles longer input sequences

03

Improved performance in summarization and question-answering

Abstract

We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2. We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/longt5
tfOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Softmax · Layer Normalization · Inverse Square Root Schedule · Byte Pair Encoding · Dropout · Linear Layer · SentencePiece · Attention Dropout · Dense Connections