RideKE: Leveraging Low-Resource, User-Generated Twitter Content for   Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

Naome A. Etori; Maria L. Gini

arXiv:2502.06180·cs.CL·February 11, 2025

RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

Naome A. Etori, Maria L. Gini

PDF

TL;DR

This paper evaluates transformer-based models for sentiment and emotion detection in low-resource, code-switched Kenyan Twitter data, highlighting the effectiveness of XLM-R and DistilBERT in this challenging context.

Contribution

It introduces a methodology for collecting and annotating Kenyan code-switched Twitter data and compares multiple models, demonstrating the superior performance of XLM-R and DistilBERT in low-resource settings.

Findings

01

XLM-R outperforms other models in sentiment analysis.

02

DistilBERT achieves the best emotion classification accuracy.

03

All models tend to predict neutral sentiment, with AfriBERT showing bias.

Abstract

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.