Overview for the Second Shared Task on Language Identification in Code-Switched Data
Giovanni Molina, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari,, Nicolas Rey-Villamizar, Mona Diab, Thamar Solorio

TL;DR
This paper overviews the second shared task on language identification in code-switched data, highlighting progress and challenges across two language pairs, MSA-DA and SPA-ENG, with multiple teams participating.
Contribution
It provides an overview of the shared task, including data, participating systems, and evaluation results, demonstrating advancements in language identification for code-switched data.
Findings
Language identification is more difficult for closely related language pairs.
Systems performed better than in the previous shared task.
Multiple teams participated, showing growing interest and progress.
Abstract
We present an overview of the second shared task on language identification in code-switched data. For the shared task, we had code-switched data from two different language pairs: Modern Standard Arabic-Dialectal Arabic (MSA-DA) and Spanish-English (SPA-ENG). We had a total of nine participating teams, with all teams submitting a system for SPA-ENG and four submitting for MSA-DA. Through evaluation, we found that once again language identification is more difficult for the language pair that is more closely related. We also found that this year's systems performed better overall than the systems from the previous shared task indicating overall progress in the state of the art for this task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultilingual Education and Policy · Natural Language Processing Techniques · Digital Communication and Language
