Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text
Bidisha Samanta, Niloy Ganguly, Soumen Chakrabarti

TL;DR
This paper introduces a method to generate synthetic code-switched text from monolingual data, significantly improving sentiment and hate speech detection accuracy in multilingual contexts involving minority languages.
Contribution
The authors propose a novel technique for synthesizing labeled code-switched text by replacing subtrees in parse trees with translated token spans, enhancing sentiment analysis models.
Findings
Sentiment detection accuracy improved by up to 7.20% across three language pairs.
Hate speech detection accuracy increased by 4-6% with synthetic and augmented data.
Synthetic data augmentation yields significant performance gains in multilingual sentiment tasks.
Abstract
Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called "code-switching". Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
