A Primer on Pretrained Multilingual Language Models
Sumanth Doddapaneni, Gowtham Ramesh, Mitesh M. Khapra, Anoop, Kunchukuttan, Pratyush Kumar

TL;DR
This survey reviews the development, evaluation, and analysis of multilingual language models, highlighting their capabilities, limitations, and future research directions in cross-lingual NLP tasks.
Contribution
It provides a comprehensive overview of existing research on extit{MLLMs}, including model scaling, benchmarking, performance analysis, and understanding universal language patterns.
Findings
Large extit{MLLMs} enable zero-shot transfer across many languages.
Benchmarking reveals strengths and weaknesses of extit{MLLMs} on various tasks.
Analysis suggests potential for improving performance on unseen languages.
Abstract
Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger \MLLMs~covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating \MLLMs~ (iii) analysing the performance of \MLLMs~on monolingual, zero-shot cross-lingual and bilingual tasks (iv) understanding the universal language patterns (if any) learnt by \MLLMs~ and (v) augmenting the (often) limited capacity of \MLLMs~ to improve their performance on seen or even unseen languages. In this survey, we review the existing literature covering the above broad areas of research pertaining to \MLLMs. Based on our survey, we recommend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · XLM-R · Linear Layer · mBERT · Attention Dropout · Softmax · Dense Connections · Adam · Layer Normalization
