TL;DR
This paper introduces code-switched word embeddings trained on Spanish-English tweets and demonstrates their effectiveness in sentiment analysis of code-mixed social media posts, outperforming baseline models.
Contribution
It presents the first embeddings trained specifically on code-switched data and evaluates their impact on sentiment classification in a competitive setting.
Findings
Embeddings trained on code-switched data improve sentiment analysis performance.
Achieved an F-1 score of 0.722, surpassing the baseline of 0.656.
Ranked 14th out of 29 teams in SemEval 2020 Task 9.
Abstract
The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9: ~\emph{Sentiment Analysis on Code-Mixed Social Media Text}. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
