CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word   embeddings for sentiment analysis

Frances Adriana Laureano De Leon; Florimond Gu\'eniat; Harish; Tayyar Madabushi

arXiv:2006.04597·cs.CL·September 8, 2020

CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

Frances Adriana Laureano De Leon, Florimond Gu\'eniat, Harish, Tayyar Madabushi

PDF

1 Repo

TL;DR

This paper introduces code-switched word embeddings trained on Spanish-English tweets and demonstrates their effectiveness in sentiment analysis of code-mixed social media posts, outperforming baseline models.

Contribution

It presents the first embeddings trained specifically on code-switched data and evaluates their impact on sentiment classification in a competitive setting.

Findings

01

Embeddings trained on code-switched data improve sentiment analysis performance.

02

Achieved an F-1 score of 0.722, surpassing the baseline of 0.656.

03

Ranked 14th out of 29 teams in SemEval 2020 Task 9.

Abstract

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9: ~\emph{Sentiment Analysis on Code-Mixed Social Media Text}. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

francesita/CS-Embed-SemEval2020
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.