TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification
Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves and, Luis Espinosa-Anke

TL;DR
This paper introduces TweetEval, a comprehensive benchmark for Twitter-specific NLP tasks, providing standardized evaluation protocols and strong baselines to advance research in social media text classification.
Contribution
It presents a unified evaluation framework with diverse Twitter tasks and compares pre-training strategies, establishing a foundation for consistent benchmarking.
Findings
Pre-trained models on Twitter data outperform generic models.
Continued pre-training on Twitter improves task performance.
TweetEval enables standardized comparison across social media NLP methods.
Abstract
The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗cardiffnlp/twitter-roberta-base-emojimodel· 400 dl· ♡ 16400 dl♡ 16
- 🤗cardiffnlp/twitter-roberta-base-emotionmodel· 26k dl· ♡ 4926k dl♡ 49
- 🤗cardiffnlp/twitter-roberta-base-hatemodel· 2.1k dl· ♡ 152.1k dl♡ 15
- 🤗cardiffnlp/twitter-roberta-base-ironymodel· 16k dl· ♡ 2916k dl♡ 29
- 🤗cardiffnlp/twitter-roberta-base-offensivemodel· 18k dl· ♡ 3218k dl♡ 32
- 🤗cardiffnlp/twitter-roberta-base-sentimentmodel· 598k dl· ♡ 333598k dl♡ 333
- 🤗cardiffnlp/twitter-roberta-basemodel· 6.3k dl· ♡ 186.3k dl♡ 18
- 🤗cardiffnlp/twitter-scratch-roberta-basemodel· 4 dl4 dl
- 🤗researchworkai/Sentiment-roBERTa-Twittermodel· 2 dl· ♡ 12 dl♡ 1
- 🤗Kapiche/twitter-roberta-base-sentimentmodel· 24 dl24 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
