gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM   Architecture for SENTIment Analysis of Code-MIXed Data

Sunil Gundapu; Radhika Mamidi

arXiv:2010.04395·cs.CL·October 12, 2020

gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data

Sunil Gundapu, Radhika Mamidi

PDF

TL;DR

This paper presents a novel LSTM-based system for sentiment analysis of code-mixed social media text, utilizing character-level and FastText embeddings to improve performance on multilingual data.

Contribution

The paper introduces a combined embedding approach with character-level and FastText embeddings within an LSTM architecture for code-mixed sentiment analysis.

Findings

01

Outperformed baseline models on SemEval 2020 Task 9

02

Effective handling of out-of-vocabulary words

03

Improved semantic understanding of code-mixed text

Abstract

The phenomenon of mixing the vocabulary and syntax of multiple languages within the same utterance is called Code-Mixing. This is more evident in multilingual societies. In this paper, we have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text. Our system first generates two types of embeddings for the social media text. In those, the first one is character level embeddings to encode the character level information and to handle the out-of-vocabulary entries and the second one is FastText word embeddings for capturing morphology and semantics. These two embeddings were passed to the LSTM network and the system outperformed the baseline model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsfastText · Tanh Activation · Sigmoid Activation · Long Short-Term Memory