Expanding the Text Classification Toolbox with Cross-Lingual Embeddings
Meryem M'hamdi, Robert West, Andreea Hossmann, Michael Baeriswyl, and, Claudiu Musat

TL;DR
This paper advances cross-lingual text classification by systematically evaluating multilingual embeddings, neural architectures, and joint training, demonstrating improved performance especially for low-resource languages.
Contribution
It introduces a comprehensive analysis of multilingual embeddings and neural models for CLTC, emphasizing joint training benefits over traditional bilingual approaches.
Findings
Multilingual joint training improves classification accuracy.
Contextual embeddings outperform non-contextual ones.
Low-resource languages benefit most from the proposed methods.
Abstract
Most work in text classification and Natural Language Processing (NLP) focuses on English or a handful of other languages that have text corpora of hundreds of millions of words. This is creating a new version of the digital divide: the artificial intelligence (AI) divide. Transfer-based approaches, such as Cross-Lingual Text Classification (CLTC) - the task of categorizing texts written in different languages into a common taxonomy, are a promising solution to the emerging AI divide. Recent work on CLTC has focused on demonstrating the benefits of using bilingual word embeddings as features, relegating the CLTC problem to a mere benchmark based on a simple averaged perceptron. In this paper, we explore more extensively and systematically two flavors of the CLTC problem: news topic classification and textual churn intent detection (TCID) in social media. In particular, we test the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
