COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter
Martin M\"uller, Marcel Salath\'e, Per E Kummervold

TL;DR
COVID-Twitter-BERT (CT-BERT) is a transformer-based model pretrained on COVID-19 Twitter data, achieving significant improvements in domain-specific NLP tasks like classification, question-answering, and chatbots.
Contribution
The paper introduces CT-BERT, a domain-specific pretrained transformer model optimized for COVID-19 social media content, with notable performance gains over general models.
Findings
10-30% performance improvement on classification tasks
Effective for various NLP tasks including question-answering and chatbots
Specialized for COVID-19 Twitter content
Abstract
In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. Our model shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. The largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Sentiment Analysis and Opinion Mining · Hate Speech and Cyberbullying Detection
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding
