TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis
Mohiuddin Md Abdul Qudar, Vijay Mago

TL;DR
This paper introduces TweetBERT, a domain-specific language model trained on Twitter data, which significantly outperforms traditional BERT models in Twitter text analysis tasks, demonstrating the importance of domain adaptation.
Contribution
The paper presents two TweetBERT models trained on millions of tweets, the first domain-specific language models for Twitter text analysis, outperforming general BERT models.
Findings
TweetBERT models outperform traditional BERT by over 7% on Twitter datasets.
Training on Twitter data improves performance across multiple datasets.
Extensive evaluation on 31 datasets validates the effectiveness of domain-specific training.
Abstract
Twitter is a well-known microblogging social site where users express their views and opinions in real-time. As a result, tweets tend to contain valuable information. With the advancements of deep learning in the domain of natural language processing, extracting meaningful information from tweets has become a growing interest among natural language researchers. Applying existing language representation models to extract information from Twitter does not often produce good results. Moreover, there is no existing language representation models for text analysis specific to the social media domain. Hence, in this article, we introduce two TweetBERT models, which are domain specific language presentation models, pre-trained on millions of tweets. We show that the TweetBERT models significantly outperform the traditional BERT models in Twitter text mining tasks by more than 7% on each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
MethodsLinear Layer · Adam · Layer Normalization · Dense Connections · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay
