Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task
Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia, Hirschberg, Thamar Solorio

TL;DR
This paper presents an overview of the CALCS 2018 shared task on Named Entity Recognition for code-switched social media data, introducing new datasets and benchmarking NER performance across two language pairs.
Contribution
It introduces a new dataset for code-switched NER and provides benchmark results, highlighting challenges and participant approaches in this complex task.
Findings
Best scores of 63.76% and 71.61% for the two language pairs
Diverse entity types and social media noise increase task difficulty
Multiple participant approaches analyzed and discussed
Abstract
In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data. We divide the shared task into two competitions based on the English-Spanish (ENG-SPA) and Modern Standard Arabic-Egyptian (MSA-EGY) language pairs. We use Twitter data and 9 entity types to establish a new dataset for code-switched NER benchmarks. In addition to the CS phenomenon, the diversity of the entities and the social media challenges make the task considerably hard to process. As a result, the best scores of the competitions are 63.76% and 71.61% for ENG-SPA and MSA-EGY, respectively. We present the scores of 9 participants and discuss the most common challenges among submissions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
