Machine learning and emoji prediction: How much accuracy can MARBERT achieve?
Mohammed Q. Shormani, Ibrahim Abdulmalik Hassan Muneef Y. Alshawsh

TL;DR
This paper explores the effectiveness of MARBERT in predicting emojis in Arabic tweets, achieving 75% accuracy, and discusses potential improvements for low-resource, multilingual contexts.
Contribution
It demonstrates fine-tuning MARBERT for Arabic emoji prediction and provides an interpretable baseline for lexical feature analysis.
Findings
MARBERT achieved 75% overall accuracy in emoji prediction.
The study highlights the potential and limitations of ML models for low-resource languages.
An interpretable preprocessing pipeline was developed for lexical feature analysis.
Abstract
This study investigates Machine Learning (ML) in the prediction of emojis in Arabic tweets employing the (state-of-the-art) MARBERT model. A corpus of 11379 CA tweets representing multiple Arabic colloquial dialects was collected from X.com via Python. A net dataset includes 8695 tweets, which were utilized for the analysis. These tweets were then classified into 14 categories, which were numerically encoded and used as labels. A preprocessing pipeline was designed as an interpretable baseline, allowing us to examine the relationship between lexical features and emoji categories. MARBERT was finetuned to predict emoji use from textual input. We evaluated the model performance in terms of precision, recall and F1-scores. Findings reveal that the model performed quite well with an overall accuracy 0.75. The study concludes that although the findings are promising, there is still a need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
