An Ensemble Model for Sentiment Analysis of Hindi-English Code-Mixed   Data

Madan Gopal Jhanwar; Arpita Das

arXiv:1806.04450·cs.CL·June 13, 2018·20 cites

An Ensemble Model for Sentiment Analysis of Hindi-English Code-Mixed Data

Madan Gopal Jhanwar, Arpita Das

PDF

Open Access

TL;DR

This paper presents an ensemble model combining LSTM and Naive Bayes to accurately analyze sentiment in Hindi-English code-mixed social media texts, addressing challenges of sparse and inconsistent data.

Contribution

The paper introduces a novel ensemble approach that leverages character-trigram LSTM and word-ngrams MNB models for sentiment analysis of code-mixed data, achieving state-of-the-art results.

Findings

01

Ensemble model outperforms baseline methods.

02

State-of-the-art accuracy on real-world code-mixed data.

03

Effective handling of sparse and inconsistent code-mixed texts.

Abstract

In multilingual societies like India, code-mixed social media texts comprise the majority of the Internet. Detecting the sentiment of the code-mixed user opinions plays a crucial role in understanding social, economic and political trends. In this paper, we propose an ensemble of character-trigrams based LSTM model and word-ngrams based Multinomial Naive Bayes (MNB) model to identify the sentiments of Hindi-English (Hi-En) code-mixed data. The ensemble model combines the strengths of rich sequential patterns from the LSTM model and polarity of keywords from the probabilistic ngram model to identify sentiments in sparse and inconsistent code-mixed data. Experiments on reallife user code-mixed data reveals that our approach yields state-of-the-art results as compared to several baselines and other deep learning based proposed methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory