Improved Sentiment Detection via Label Transfer from Monolingual to   Synthetic Code-Switched Text

Bidisha Samanta; Niloy Ganguly; Soumen Chakrabarti

arXiv:1906.05725·cs.CL·June 14, 2019·6 cites

Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text

Bidisha Samanta, Niloy Ganguly, Soumen Chakrabarti

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to generate synthetic code-switched text from monolingual data, significantly improving sentiment and hate speech detection accuracy in multilingual contexts involving minority languages.

Contribution

The authors propose a novel technique for synthesizing labeled code-switched text by replacing subtrees in parse trees with translated token spans, enhancing sentiment analysis models.

Findings

01

Sentiment detection accuracy improved by up to 7.20% across three language pairs.

02

Hate speech detection accuracy increased by 4-6% with synthetic and augmented data.

03

Synthetic data augmentation yields significant performance gains in multilingual sentiment tasks.

Abstract

Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called "code-switching". Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bidishasamantakgp/2019_CSGen_ACL
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques