Overview for the Second Shared Task on Language Identification in   Code-Switched Data

Giovanni Molina; Fahad AlGhamdi; Mahmoud Ghoneim; Abdelati Hawwari,; Nicolas Rey-Villamizar; Mona Diab; Thamar Solorio

arXiv:1909.13016·cs.CL·October 1, 2019·29 cites

Overview for the Second Shared Task on Language Identification in Code-Switched Data

Giovanni Molina, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari,, Nicolas Rey-Villamizar, Mona Diab, Thamar Solorio

PDF

Open Access

TL;DR

This paper overviews the second shared task on language identification in code-switched data, highlighting progress and challenges across two language pairs, MSA-DA and SPA-ENG, with multiple teams participating.

Contribution

It provides an overview of the shared task, including data, participating systems, and evaluation results, demonstrating advancements in language identification for code-switched data.

Findings

01

Language identification is more difficult for closely related language pairs.

02

Systems performed better than in the previous shared task.

03

Multiple teams participated, showing growing interest and progress.

Abstract

We present an overview of the second shared task on language identification in code-switched data. For the shared task, we had code-switched data from two different language pairs: Modern Standard Arabic-Dialectal Arabic (MSA-DA) and Spanish-English (SPA-ENG). We had a total of nine participating teams, with all teams submitting a system for SPA-ENG and four submitting for MSA-DA. Through evaluation, we found that once again language identification is more difficult for the language pair that is more closely related. We also found that this year's systems performed better overall than the systems from the previous shared task indicating overall progress in the state of the art for this task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultilingual Education and Policy · Natural Language Processing Techniques · Digital Communication and Language