Unraveling Code-Mixing Patterns in Migration Discourse: Automated Detection and Analysis of Online Conversations on Reddit
Fedor Vitiugin, Sunok Lee, Henna Paakki, Anastasiia Chizhikova, Nitin, Sawhney

TL;DR
This paper introduces ELMICT, an ensemble learning approach that automatically detects code-mixed texts in migration-related discussions on Reddit, revealing linguistic strategies of migrants and aiding inclusive digital services.
Contribution
The study presents a novel ensemble learning method for high-accuracy detection of code-mixed social media texts, especially in cross-lingual zero-shot scenarios.
Findings
ELMICT achieves F1 > 0.95 in code-mixing detection.
High performance (F1 > 0.70) in cross-lingual zero-shot conditions.
Code-mixing is prevalent in migration-related discussions on Reddit.
Abstract
The surge in global migration patterns underscores the imperative of integrating migrants seamlessly into host communities, necessitating inclusive and trustworthy public services. Despite the Nordic countries' robust public sector infrastructure, recent immigrants often encounter barriers to accessing these services, exacerbating social disparities and eroding trust. Addressing digital inequalities and linguistic diversity is paramount in this endeavor. This paper explores the utilization of code-mixing, a communication strategy prevalent among multilingual speakers, in migration-related discourse on social media platforms such as Reddit. We present Ensemble Learning for Multilingual Identification of Code-mixed Texts (ELMICT), a novel approach designed to automatically detect code-mixed messages in migration-related discussions. Leveraging ensemble learning techniques for combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Discourse Analysis in Language Studies
