C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text using Feature Engineering
Laksh Advani, Clement Lu, Suraj Maharjan

TL;DR
This paper presents a feature engineering-based sentiment analysis method for code-mixed social media text, achieving competitive F1 scores on Hinglish and Spanglish datasets in SemEval-2020.
Contribution
It introduces a novel feature engineering approach specifically designed for code-mixed sentiment analysis, addressing a gap in multilingual NLP tasks.
Findings
Achieved weighted F1 scores of 0.65 on Hinglish and 0.63 on Spanglish datasets.
Demonstrated effectiveness of lexical, sentiment, and metadata features for code-mixed sentiment classification.
Showed that feature engineering can improve sentiment analysis in multilingual social media contexts.
Abstract
In today's interconnected and multilingual world, code-mixing of languages on social media is a common occurrence. While many Natural Language Processing (NLP) tasks like sentiment analysis are mature and well designed for monolingual text, techniques to apply these tasks to code-mixed text still warrant exploration. This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix. We tackle this problem by leveraging a set of hand-engineered lexical, sentiment, and metadata features to design a classifier that can disambiguate between "positive", "negative" and "neutral" sentiment. With this model, we are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
