TL;DR
This paper introduces McM, a novel deep learning model for bilingual SMS classification that effectively handles multilingual, informal, and noisy short texts without external resources, outperforming previous models.
Contribution
The paper presents a multi-cascaded deep learning model that learns bilingual SMS classification without code-switching cues or external knowledge, addressing a gap in multilingual short text classification.
Findings
Achieves high accuracy on a new bilingual SMS dataset
Outperforms previous multilingual text classification models
Demonstrates language independence of the proposed approach
Abstract
Most studies on text classification are focused on the English language. However, short texts such as SMS are influenced by regional languages. This makes the automatic text classification task challenging due to the multilingual, informal, and noisy nature of language in the text. In this work, we propose a novel multi-cascaded deep learning model called McM for bilingual SMS classification. McM exploits -gram level information as well as long-term dependencies of text for learning. Our approach aims to learn a model without any code-switching indication, lexical normalization, language translation, or language transliteration. The model relies entirely upon the text as no external knowledge base is utilized for learning. For this purpose, a 12 class bilingual text dataset is developed from SMS feedbacks of citizens on public services containing mixed Roman Urdu and English…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
