TurkishBERTweet: Fast and Reliable Large Language Model for Social Media Analysis
Ali Najafi, Onur Varol

TL;DR
TurkishBERTweet is a lightweight, large-scale pre-trained language model for Turkish social media text, offering faster inference and better performance on classification tasks compared to existing models, and is openly available for research.
Contribution
We introduce TurkishBERTweet, the first large-scale Turkish social media language model trained on 900 million tweets, with improved efficiency and performance, and provide fine-tuned adapters for social media analysis.
Findings
Outperforms existing models in generalizability
Offers significantly lower inference time
Cost-effective compared to commercial solutions
Abstract
Turkish is one of the most popular languages in the world. Wide us of this language on social media platforms such as Twitter, Instagram, or Tiktok and strategic position of the country in the world politics makes it appealing for the social network researchers and industry. To address this need, we introduce TurkishBERTweet, the first large scale pre-trained language model for Turkish social media built using almost 900 million tweets. The model shares the same architecture as base BERT model with smaller input length, making TurkishBERTweet lighter than BERTurk and can have significantly lower inference time. We trained our model using the same approach for RoBERTa model and evaluated on two text classification tasks: Sentiment Classification and Hate Speech Detection. We demonstrate that TurkishBERTweet outperforms the other available alternatives on generalizability and its lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Weight Decay · Residual Connection · Dense Connections · Dropout · Softmax
