XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond
Francesco Barbieri, Luis Espinosa Anke, Jose Camacho-Collados

TL;DR
This paper introduces XLM-T, a multilingual language model trained on Twitter data across multiple languages, providing a new benchmark and datasets for sentiment analysis and beyond in social media NLP.
Contribution
The paper presents a new multilingual Twitter-specific language model, XLM-T, along with a strong baseline and unified sentiment datasets in eight languages for improved social media NLP tasks.
Findings
XLM-T outperforms existing models on multilingual Twitter sentiment analysis.
The model provides a versatile foundation for various multilingual social media NLP tasks.
The datasets enable standardized evaluation across multiple languages.
Abstract
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a model to train and evaluate multilingual language models in Twitter. In this paper we provide: (1) a new strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages and a XLM-T model fine-tuned on them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗cardiffnlp/twitter-xlm-roberta-base-sentimentmodel· 895k dl· ♡ 255895k dl♡ 255
- 🤗cardiffnlp/twitter-xlm-roberta-basemodel· 6.7k dl· ♡ 186.7k dl♡ 18
- 🤗MilaNLProc/hate-itamodel· 208 dl· ♡ 4208 dl♡ 4
- 🤗cardiffnlp/xlm-twitter-politics-sentimentmodel· 138 dl· ♡ 10138 dl♡ 10
- 🤗Andrazp/multilingual-hate-speech-robacofimodel· 145 dl· ♡ 1145 dl♡ 1
- 🤗morit/french_xlm_xnlimodel· 15 dl· ♡ 215 dl♡ 2
- 🤗morit/spanish_xlm_xnlimodel· 4 dl· ♡ 14 dl♡ 1
- 🤗morit/english_xlm_xnlimodel· 4 dl4 dl
- 🤗morit/chinese_xlm_xnlimodel· 8 dl· ♡ 238 dl♡ 23
- 🤗morit/german_xlm_xnlimodel· 8 dl· ♡ 38 dl♡ 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Natural Language Processing Techniques
MethodsXLM-R
