mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences
David Uthus, Santiago Onta\~n\'on, Joshua Ainslie, Mandy Guo

TL;DR
mLongT5 is a multilingual, efficient transformer model designed for long input sequences, outperforming existing models in multilingual summarization and question-answering tasks by leveraging LongT5 architecture and multilingual pretraining datasets.
Contribution
The paper introduces mLongT5, a novel multilingual text-to-text transformer optimized for long sequences, combining LongT5 architecture with multilingual pretraining techniques.
Findings
mLongT5 outperforms mBART and M-BERT on multilingual tasks
The model effectively handles longer input sequences
Improved performance in summarization and question-answering
Abstract
We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2. We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Softmax · Layer Normalization · Inverse Square Root Schedule · Byte Pair Encoding · Dropout · Linear Layer · SentencePiece · Attention Dropout · Dense Connections
