Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of   Indian Languages

Soumil Mandal; Sainik Kumar Mahata; Dipankar Das

arXiv:1803.04000·cs.CL·March 13, 2018·36 cites

Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Soumil Mandal, Sainik Kumar Mahata, Dipankar Das

PDF

Open Access

TL;DR

This paper presents a newly created Bengali-English code-mixed corpus with sentiment labels, developed using hybrid annotation systems to facilitate sentiment analysis in multilingual social media data.

Contribution

The paper introduces a gold standard Bengali-English code-mixed corpus with sentiment tags, along with hybrid annotation methods to reduce manual effort and improve annotation quality.

Findings

01

High inter-annotator agreement achieved

02

Effective hybrid systems for language and sentiment tagging

03

Comprehensive analysis of code-mixed properties

Abstract

Analysis of informative contents and sentiments of social users has been attempted quite intensively in the recent past. Most of the systems are usable only for monolingual data and fails or gives poor results when used on data with code-mixing property. To gather attention and encourage researchers to work on this crisis, we prepared gold standard Bengali-English code-mixed data with language and polarity tag for sentiment analysis purposes. In this paper, we discuss the systems we prepared to collect and filter raw Twitter data. In order to reduce manual work while annotation, hybrid systems combining rule based and supervised models were developed for both language and sentiment tagging. The final corpus was annotated by a group of annotators following a few guidelines. The gold standard corpus thus obtained has impressive inter-annotator agreement obtained in terms of Kappa values.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Spam and Phishing Detection