gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data
Sunil Gundapu, Radhika Mamidi

TL;DR
This paper presents a novel LSTM-based system for sentiment analysis of code-mixed social media text, utilizing character-level and FastText embeddings to improve performance on multilingual data.
Contribution
The paper introduces a combined embedding approach with character-level and FastText embeddings within an LSTM architecture for code-mixed sentiment analysis.
Findings
Outperformed baseline models on SemEval 2020 Task 9
Effective handling of out-of-vocabulary words
Improved semantic understanding of code-mixed text
Abstract
The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text. Our system first generates two types of embeddings for the social media text. In those, the first one is character level embeddings to encode the character level information and to handle the out-of-vocabulary entries and the second one is FastText word embeddings for capturing morphology and semantics. These two embeddings were passed to the LSTM network and the system outperformed the baseline model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsfastText · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
